
DATE DUE 


For each day’s delay after the due date 
a fine of 3 P. per Vol. shall be charged for 
the first week, and 25 P. per Vol. per day 
for subsequent days. 


Borrower’s 

1 Date 

Borrower’s 

Date 

No. 

Due 

No. 

Due 


1 

1 _ __ 

_ 

— 

1 

- 

i 

— 






INTRODUCTION TO 


STATISTICAL 

CALCULATIONS 


By 

J. MOUNSEY, B.Sc.(Econ.) 

Formerly Vice-Principal of the College of 
Commerce, Leeds 



Printed in Great Britain for English Universities Press , Limited, by 
Richard Clay and Company, Ltd., Bungay, Suffolk 



PREFACE 


This book deals, not with statistical theory, but with statistical 
calculations. Its purposes are : (i) by means of numerous typical 
examples, worked out and explained at length, to make the calcula¬ 
tions clearly understandable, and (ir) to provide a comprehensive set 
of exercises, with answers, so that the reader may thoroughly test 
lis progress, and so gain confidence and facility in computation. 

The book is designed primarily for students in Commercial and 
Technical Colleges who are preparing for a wide range of examina¬ 
tions, and for the use of lecturers on statistics in such Institutions. 
It should prove equally useful to those outside the Colleges who 
have a practical interest in statistics, whether or not they have an 
examination in view. It should also be of use in Grammar Schools 
where statistics is included in the mathematical syllabus. 

The topics included cover a wide range for such a book, and 
cater for the needs of a variety of students, some of whom requite 
only the earlier chapters, while others need mainly the later 
chapters. 

Most of the traps which beset the newcomer in this subject have 
been exposed. It may be felt that the explanatory text is un¬ 
necessarily full. It is, however, better that the writer should 
'abour the point rather than that the leader should miss it. No 
‘ obvious ” step in a demonstration has been omitted, since 
lefinitely the average student of statistics, for whom this book is 
ntended, “ non fucil saltum *\ 

A list of examining authorities, whose computational require- 
rents in statistics have been wholly, or mainly, covered in this 
»*ook, is given on page vi. 

I am much indebted to the Trustees of Btometrika for permission 
fto print Tables I and II, which have been adapted from Table II of 
Tables for Statisticians and Biometricians, Part I ; to Professor 
Ronald A. Fisher, Cambridge, to Dr. Frank Yates, Rothamsted, 
md to Messrs. Oliver & Boyd, Ltd., Edinburgh, for permission to 
mnt Tables III, IV, V and VI, which are abridgements of Tables 
W, III, VI and V from their book : Statistical TaMes for Bio - 
gical, Agricultural and Medical Research ; to the Controller pf 
- us Majesty's Stationery Office, for permission to print data take!* 
om numerous official publications; and to the British Sta^ard$ 



VI 


PREFACE 


Institution, London, for permission to use data taken from thei 
publication B.S. 600 R : 1942. 

I am also much indebted to the University of London, the Roy 
Statistical Society, the Institute of Transport and the Society < 
Incorporated Accountants and Auditors for permission to make u: 
of questions taken from their examination papers. 

Where copyright material has been used, acknowledgement h 
been made in the text. If, in spite of every care to avoid it, a 
questions or data have been taken from examination papers or e 
where without acknowledgement, I hope that this will be excuse* 
due to inadvertence and to the unnoticed loss of reference to 
sources of some of the material used in lectures over many year 


The following abbreviations have been used : 

L.U. University of London. 

R.S.S, Royal Statistical Society. 

Inst.T. Institute of Transport. 

Soc.I.A.A. Society of Incorporated Accountants and Auditors 

List of Examining Aulhortttes 
The University of London : 

Intermediate and Final B.Com., old regulations. 

Alternative Paper " : B.Se. Eeon., old regulations. 
Elementary Statistical Methods and Sources": Part 
B.Sc. Econ., new regulations. 

he Royal Statistical Society : The Certificate Examination, 
he Association of Incorporated Statisticians : 

Examination for Registered Statistical Assistants. 

The Intermediate Examination. 

The Ordinary and Higher National Certificates in Commerce. 
Local Government Promotion Examinations. 

Commercial and Professional : 

The Association of Certified and Corporate Accountants. 

The British Institute of Management. 

The Building Societies Institute. 

The Incorporated Association of Rating and Valuation Officers. 
The Incorporated Sales Managers' Association. 

The Institute of Export. 

The Institute of Hospital Administrators. 

The Institute of Industrial Administration. 

The Institute of Municipal Treasurers and Accountants. 

The Institute of Personnel Management. 

The Institute of Transport. 



CONTENTS 


PACE 


* ^face. . . . . . . . . . • , v 

[ CHAPTER I. Measures of Location—Averages and 
Partition Values 

Arithmetic Mean . . . . . . . .13 

> Simple series—Frequency distribution—Use of an “ arbitrary ” 

j origin—Use of class interval units 

v Median .......... 19 

Simple series—Frequency distribution 

rtition Values ......... 22 

By calculation—by interpolation on ogive 

i .ore Difficult Types of Frequency Distribution ... 26 

“ Open ends ” and irregular intervals—Calculations of mean and 
median—Construction of histograms 

he Mode .......... 28 

le Geometric Mean ........ 29 

Simple series—The weighted geometric mean—Geometrical 

, „ progressions—The “ compound-interest law " 

1 ic Harmonic Mean ........ 31 

Simple series—The harmonic and geometric means of a frequency 
distribution 

Exercise 1 ....... 33 

CHAPTER II. Weighted Averages. Rates. Use of 
Averages in Connection with the Analysis 
of Time Series 

Weighted Averages ........ 46 

Use of approximate weights—The averaging of averages 

Rates . . ........ 48 

Crude and standardized rates—Gross and net reproduction rates 

Time Series ... . .... 62 


Definitions . the moving average—Long-term trend : use of 
arithmetic mean, Choice of period for moving average; Use of 
geometric mean—Seasonal fluctuations, and corrections; Use 
of arithmetic mean and deviations from trend ; Use of geometric 
mean and ratios to trend, Method of quarterly averages; 
Method of link relatives 

Exercise 2 ......... 

\ii 


67 



PAG 


Vlll 


CONTENTS 


CHAPTER III. Index Numbers 


Price Index Numbers—Types ....... 

Simple aggregate of prices—Weighted aggregate of prices— 
Simple arithmetic mean of price relatives—Simple geometric 
mean of price relatives, The time reversal test—The Chain 
base—Weighted arithmetic mean of price relatives—Miscel¬ 
laneous examples 


Quantity Index Numbers 

Laspeyre and Paasche Indices 

Declared and standardized values 

Other Types of Index 

The Factor Reversal Test 

Exercise 3 ... 



7 


Si 

81 


9 / 




9 1 


CHAPTER IV. Estimates and Limits of Error 


Maximum Absolute Errors .... 10 

Maximum Relative Errors . . KM 

Miscellaneous examples 

Rounding Off Numbers ... . IP 

Biassed and Unbiassed Errors 113 

Significant Figures and Approximations . IM 

Exercise 4 ..... 115 

Additional Exercises Chapters I-1V . . 119 

CHAPTER V. Dispersion and Skewness 
Measures of Dispersion . .138 


Definitions—Computations of quartile, mean and standard 
deviation - Change of unit--Coefficients of dispersion—Relation 
between the standard deviation of a group and the standard 
deviations ot its sub-groups —Summation method of computing 
mean and standard deviation-—Standard deviation and the 
adjustment of examination marks—Formulae—The Lorenz 
curve 

Measures of Skewness . . ..152 

. Exercise 5 ... .153 

CHAPTER VI. Lines of “ Best Fit V Correlation 

Straight Lines of Best Fit . . . . . . .160 

The normal equations and their application 



CONTENTS 


ix 

PAQJt 

The Coefficient of Correlation ....... 105 

Formulae—Calculations of the coefficient; Simple series; Bi¬ 
variate table; Standard error of the regression lines; Diagonal 
summation 

Rank Correlation . . . . . . . . .172 

Time Series ..... . 174 

Correlation between—Lag 

Linear and Non-linear Regression ...... 176 

The correlation ratios 

Variance of the Sum or Difference of Two Series . .180 

Elementary Curve Fitting . . , . . . .181 

Method of least squares—Derivation and application of the 
normal equations; Straight lines, 2nd degree parabolas—Use 
of logarithms to determine curves of best fit 

Exercise 6 . . . . . . . . .185 

CHAPTER VII. Calculation of Moments 

Power Moments ......... 194 

Definitions — Computations—Sheppard's corrections — Further 
computations 

Factorial Moments . . . . . . . .199 

Definition—Computation, Expression in terms of power 
moments—Summation methods of computation 

Moments of a Continuous Distribution . . . 203 

Par aboli c—T rian gu lai*—R ectangu lar 

Exercise 7 ......... 206 

CHAPTER VIII. Elementary Problems in Probability 

Definitions ..... .... 208 

Theoretical Probabilities . .... 210 


Binomial series—Unsymrnetrical coins, etc —Sums of ** spots *' 
in tossing dice—Varying probability—hypergeometric series— 
card problems—Mathematical expectation—Line and circle 
problems—Derangements 

Exercise 8 ......... 224 

CHAPTER IX. Binomial, Poisson and Normal 
Distributions 

The Binomial Distribution ....... 230 

Moments of the distribution—Other constants of the distribution. 
General terra 




CHAPTER I 


MEASURES OF LOCATION-AVERAGES 
AND PARTITION VALUES 

I. The Arithmetic Mean 

(i) Let a series of n numbers be represented by 


Xi, x 9 


Then the sum of the series 


and the mean 


*3 

= x . 


^2 ~L 


S#, where S indicates summation, 


2*X 

n 


Note that Zx — nx, i.e., that n times the mean of the series gives 
the sum of the series; for example, if the average weekly wage of 10 
workmen is £5, then the weekly wages bill is £50. 

(ii) If in a series of numbers some or all of the items occur more 
than once, then if / stands for the number of times an item occurs, 
the series may be written 

fl X l<fz X ‘i ■ ■ ■ /»-%■ 

The sum of this series = J x x 1 ~f~ / 2 # 2 f n x n 


and its mean 


Xfx 

Zfx 


Zfx 

N 


where N = X/ = total number of items and Zfx ■= N*. 

Thus the sum of the series 2, 2, 2, 3, 3, 4, 4, 4, 5, 5 may be 
written 3(2) + 2(3) + 3(4) + 2(5), 

{3(2) + 2(3) + 3(4) + 2(5 )} y 
(3 + 2 + 3 + 2) 

Hfx 34 
2 / *“ 10 : 

(iii) Let (x — x) represent the difference between any of the given 
values of x and the mean of the series, that is, it represents a “ de¬ 
viation ” from mean x. 


and its mean as 


i e., 


3-4. 


13 



14 INTRODUCTION TO STATISTICAL CALCULATIONS 

Then 

S(,* — *) = [x x — x) + (x 2 — x) + (*3 — x) + • • ■ -t- (*»— X) 

= (x 1 + + . . . 4- X n ) — KX 

! ^ ~ Xr — nx 

m 

= 0. 

« Similarly, 

' £/(* — #) = f 1 (x 1 — x) +f 2 (x 2 +/»(«» ~ *) 

- (/l*l + / 2 *3 f • . . 4 fnXn) - S/5 
= Zfx - N* -= 0, 

i.e., the algebraic sum of the deviations of a series of numbers from 
their mean is zero. 

(iv) Let deviations be measured from any other number than the 
mean, whether included within the range of the series or not, repre¬ 
sented by A, and called a “ false mean ” or “ arbitrary origin ”. 
A may be either larger or smaller than x, and the numerical differ¬ 
ence between A and x without regard to sign, \A — x\, will be 
jcalled c. “ ~ 

Sum of deviations from A 


\ 


\ x i — A) + (x 2 — A) + , 

- li(x — A) 

- Xx — nA, and writing A 

— Xx — nx i nc 

Z{x — A) £* _ . 

- 

nn 

- ± c, " 


• 1" (Lt 

(* ± o) 


where the sign of c will show if A is greater or less than x, i.e., the 
algebraic sum of the deviations from the arbitrary origin* divided 
E Qhe nu mber of Items, is the numerical difference between the 
true mean and the arbitrary origin. 

‘"“'Similarly, it can be shown that 


S/(* - A) 


= C, 


i.e., 


Example I, The mean of 3, 5, 6, 7, 4 


l*-A|. 

Zx _ 26 _ 
n 5 



MEASURES OF LOCATION 


1$ 

Example XI. The deviations of the various items from the mean in 
Example I are: — 2, 0, -f- i, + 2, — 1, and the algebraic sum of these 
deviations is zero. 

Example III. The deviations of the numbers in Example I from the 
arbitrary origin 4 are: — 1, -f 1, + 2, + 3, 0. 

• 2(*-A)^5 S +* 

n " 5 

= I 5 — 41 = c, i.e, | x — A|, 
xl' = A -j- f t 

“4+1—5. 


Example IV. The mean of the eleven numbers 3 , 3 , 5 , 5 , 6 , 6 , 6 , 
7, 7, 7, 4 may be written 

- ^ (3 XJ) _+ ( 5 _X_ 2 ) +J6 x 3 ) + (7 x 3 ) + (4 X 1 ) 

2 + 2 -j 3 -f 3+1 


^fx_ _ 59 
E/ ~ 11 


°i l 


Here 2, 2, 3, 3, 1 are the frequencies of occurrence of the several 
values of x, or their “ weights ”, and this is then called a " weighted ” 
average, as distinguished from the average in Example I, which is called a 
“ simple " or “ unweighted ” average. There is no real difference be¬ 
tween them in principle; the average in Example IV could obviously be 

Ex 

calculated from the formula i.e., as the sum of the eleven separate 

n 

items divided by 11; but there may be a very decided difference on 
the score of convenience m calculation. An average.,caniioh,act uall j y 
be unweighted, though where the weights are all the same, as^when 
x lb. pf tea are bought at each of several different prices, they, may be 
disregarded. 

Example V. Find the mean of 14*2, 13*5, 14*6, 13*7, 14*0. ^ 



70-0 


= 14*0. 




(ii) Let the “ assumed mean ”, A, — 13. * 

E(x - A) (1*2 + 0*5 + 1*6 + 0*7 + 1*0) 


- 1 * 0 , 

n 

13 + 1 = 14. 


Then 


5 



16 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example VI. Find the mean age of the persons in the following 
frequency distribution. 


/ 


* 


N 


Years of 

No. of 

age. 

persons. 

M 

(f) 

14- 

25 

i5-: 

40 

16- 

62 

17- 

55 

18- 

28 

— 210 


Deviations 
from 16 5. 


(*-A) 

f(x - A) 

_ 2 

— 50 

- 1 

- 40 

+ 1 

+ 55 

+ 2 

+ 56 

— 

+ 21 


2/ = 210, A = 16-5; E/(# — A) — — 90 +- 111 -= -}- 21 ; 

m* - A) 


' - A + 

= 16 6+- 


N 

21 

2]i) 


— 16-6 years. 


In the above example the variable, x, is years of age. The class 
interval is one year. The mid-points of the successive intervals are 
14*6, 15*5, . . .+8 5 years. 

TTie mean age is calculated on the assumption that the 25 persons 
of age 14 and under 15 have an average age of 14*5 years; the 40 
persons in the next interval an average age of 15*5 years, and so on. 

The mean age could, of course, be calculated directly as follows: 

14*5 x 25 = 362-5 

15- 5 X 40 — 620 0 

16- 5 X 62 — 1023-0 

17*5 X 55 - 962-5 

18-5 X 28 = 518-0 


210 3486-0 

. , 3486 ,* „ 

Mean age = - - = 16-6 years. 

210 

The “ short-cut ” method, i.e., measuring deviations from an 
arbitrary origin and making the necessary correction to find the 
true mean is, however, in most cases much less laborious, and should 
generally be used. 

The arbitrary origin should be taken at the mid-point ofm interval , 
and the interval selected should be that one in which it appears by 
inspection that the mean is likely to be located. It is not always 
possible to hit on the actual interval which contains the mean, but 



MEASURES OF LOCATION 


11 

the nearer the interval selected is to it, the less arithmetical work 
will be involved. Naturally, the answer will be the same whichever 
interval is taken. 

The reader should note that the straightforward method of obtain- 


ing the mean is simply equivalent to taking the arbitrary origin at 

zero. In the previous example 

A) is then ~ 8 q 6 - 16-6‘and 

mean # = 0 + 16-6 years. 



Example VII. Find 
distribution. 

the mean 

height in the 

following frequency 

Height in inches 




(mid-point of 

Frequency 



interval) (a) 

if) 

(x — 66) 

f(x - 66). 

60 

2 

- 6 

- 12 

62 

8 

- 4 

- 32 

64 

17 

- 2 

- 34 

66 

25 


- 78 

-T 

68 

18 

f 2 

+ 36 

70 

11 

+ 4 

+ 44 

72 

4 

f 6 

+ 24 


85 


f- 104 


104 - 78 

26 


Mean — 66 


- 66 1 „ - 

66-31 inches. 


+ 85 

85 


In the above example the interval is 2 inches. 

Where the interval 

is not unity, it is preferable to do the working in “ class intervals ", 

as follows. 




Example VIII. 


(Unit : 

~ 2") 

X 

/ 

(x - 66) 

fix - 66). 

60 

2 

- 3 

- 6 

62 

8 

- 2 

- 16 

64 

17 

- 1 

- 17 

66 

25 

— 

- 39 

m 

18 

-F 1 

4 18 

70 

11 

+ 2 

F 22 

72 

4 

h 3 

+ 12 


85 


+ 52 


* $ 



- A 

% 


INTRODUCTION TO STATISTICAL CALCULATIONS 

S/(* - A) 13 . . . . 

— - = — m class interval units 

N 85 .X 


13 

85 


X 2 m the original units 


26 


66 + _ — 66-31 inches 
85 


Where intervals are wide and frequencies large this method saves 
much labour and on this, and on other grounds, the student should 
always employ it. 

(v) Let Xj and n x be the mean and frequency of one series of 
numbers and x 2 and n 2 the mean and frequency of another senes. 

Then the frequency of the combined series is 


»i + w 2 ~ N, say, 


and the mean of the combined series is 


fl\Xi T" 

n x -f n 2 


— X, say. 


Example IX. The mean wage of 150 men was £2 10s per week, and 
that of another 200 men was £3 per week. What was the mean wage 
of the 350 men considered as one group ? 


Mean wage = 


(150 x 2 5) + (200 X 3) 
350 


This can, of course, be extended to any number of groups. 

Example X. The mean age of a group of 100 children was 9 35 years. 
The mean age of 25 of them was 8*75 years and that of another 65 was 
10*51 years. What was the mean age of the remainder ? 


Mean age = 


(100 x 9 35) - {(25 X 8-75) -f (05 X 10-51)} 
100~L“(25 + 65) 


i.e., in symbols 


NX - 
N - 


(n l x l + »,*,) 


. fe_+^L. 

It may be noted that the mean of a distribution is the “ x ” 
Abscissa of the vertical which passes through the centre of gravity— 
I Of,- more properly, centre of area—of the frequency curve. The 
median ", on the other hand, is that value of the variable, x , the 
vertical through which divides the area of the frequency curve into 
two equal parts. 





MEASURES OF LOCATION 


19 


II. The Median 

/ The median height of a group of men is the height of the middle 
individual when the men are arranged in order of height, if there are 
* an odd number of men. 

E.g., suppose the heights of 5 men to be 6 ' 0", 5' 9", 5' 6 ", 5 f 8", 
5' 10". * ^ ~ 

Arraying these heights we have 

5' 6 ", 5' 8 ", 53 ;:, 5' 10", O' 0" 

Median height = 5^9". 

If another man, 6 ' 2 " in height, is added to the group the array 
becomes: 

5' 6 ", 5' 8 ", 5/9" 5' 10", 6 ' 0", 6 ' 2". 

There is now no middle man, and the median height of the group 
is commonly taken as the mean between the heights of the two mid¬ 
most men, numbers 3 and 4 in the above case, and is consequently 
£ 9 *". 

The reader should note that the height of the additional man, so 
long as he does not affect the position of the two central men, has 
no effect whatever on the median height. So long as he is greater 
than 5' 10" the median remains 5' 94". If, however, he had been 
5' 8 " or less, the median height oFThe group would have been 5' 8 ^". 


The Median of a Frequency Distribution 

In dealing with a simple series of N items the median item is 

correctly the ^—~-^th, and the median value of the variables is 

|the value of the t^th item. 

In finding the median value from a grouped distribution the 
* median item is often stated to be the ^—Jth, as in a simple 

N 

\ series, but the correct procedure is to take it as the -gth item. This 

ensures that the same result is obtained whichever end of the table 

N + 1 

is used as a starting point. If —— is used, the two results will 
differ. This point is illustrated below. • 



20 INTRODUCTION TO STATISTICAL CALCULATIONS 


It should be noted that the median value of a grouped distribution 
will differ, more or less, from the value which would be obtained 
from the full data—i.e., the complete array—since grouping always 
entails loss of detail. 

^"Consider the following frequency table and histogram: 


Age in years ( x) .10- ] 2- 

Numbers (/): 3 9 


14- 16- 18- 20- 22- 

10 26 18 12 5 



CUMULATIVE FREQUFNCY 
Fig 1 


S/= 89 


The median age is that of the -^th, i.e., the 44*5th, person. The 
cumulated frequencies show that 28 persons”arelih(Ier IB years and 
that 54 are under 18 years. 

The 44-5th person is therefore in the interval “ 16 and under 18 
The median age = 16 + ^ of 2, 

= 17*269 years 

since the interval is 2 years and there are 26 persons in this interval. 

Cumulating the frequencies from the other end of the table we 
get the following (see Fig. 2). 

It will be seen that 61 persons are over 16 years and 35 persons 
over 18 years. The 44*5th person is therefore in the interval " 16 
and less than 18 








MEASURES OF LOCATION 


21 


Median = 18 


(4 4-5 - 35)2 
26" 


17*269 years. 
/N + 1\ 


If the median person had been taken as the )th, i.e,, as 

the 45th, then starting from the left of the histogram the median 



89 86 77 61 35 V 5 O 

CUMULATIVE FREQUENCY 
Fig 2 


r 


age would have been 17*307 years, and starting from the right 
17-231 years. 

The usual methods of obtaining an estimate of the median will 
now be illustrated. 

Example XI. 


Weight m lb 

M 

No of 
persons ( f) 

Cumulative fiequency (F), 

40 and under 45 

15 

15 items are under 45 lb 

45 

50 

25 

40 


50 „ 

50 

55 

40 

80 

,, 

55 ,, 

55 

60 

55 

135 

, , 

60 „ 

60 

65 

50 

185 

, , 

65 „ 

65 

70 

t 

220 

, , 

70 „ 

70 

75 

245 

•* 

75 „ 



245 











22 INTRODUCTION TO STATISTICAL CALCULATIONS 


The median item is the th, i.e., the 122-5th. 

The last item in the “ 50 and under 55 ” interval * 

is the 80th 

leaving 42-5 items 

i e., the median is in the “ 55 and under 60 ” interval. 
There are 55 items in this interval, which covers 5 lb.; 

, . _ cr , (122-5 - 80)5 

.. median weight — 55 -}----~ 

55 

= 58-86, say 58-9 lb. 


III. Partition Values 

Other useful “ statistics ", or measures descriptive of a distribu¬ 
tion, can be obtained from a frequency table in a similar manner, 
and for convenience their calculation will be considered here. 


The Quartiles 

/ A distribution is divided into four equal parts by the two quartiles 
/ and the median. 

/ In a simple series of N items the first quartile item is the ^ -J th 

^ and the third quartile item the dl2)th. 

N 3N 

In a frequency distribution the ^ th and the -^-th items give the 
required values. 

Example XII. Find the quartile weights of the distribution of 
Example XI. 


245 

The first quartile item is the ~^-tli, i.e., the 
; The end of the “ 45 and under 50 ” interval 

is the 


61*25th. 


40th 

leaving 21*25. 

is in the “ 50 and under 55 ” interval which contains 40 items, 
and is 5 lb. in width. 

21-25 X 5 


Qi = 60 + 


40 


52-66 lb. 


* See Note on Intervals and Mid-points, page 23. 



MEASURES OF LOCATION 


23 


\ 


Similarly 


q 3 = 6o + 


= 64-88 lb. 

Note.—Limits of intervals and values of mid-points of intervals. 

The actual upper and lower limits of the intervals in Example XI 
depend on the method adopted in obtaining the weights. 

If weights were taken to the nearest lb. then the limits are not 
40 lb. and under 45 lb. etc., but 

39-5 lb. and under 44*5 lb. 

44-5 lb. and under 49*5 lb., etc. 
with mid-points 42, 47 and so on. 

If the weights were correct to the nearest £ lb. the intervals are 
39-J and under 44-J, 44-J and under 49|, etc. 

If the weights in Example XI were correct to the nearest £ lb. 
then the values obtained above for the median and quartiles should 
each be reduced by lb. 

Where no indication is given in a frequency table of the degree of 
accuracy of the measurements, e.g., 

height 60" and under 61" 

61" „ 62" 

the limits may be taken as stated, with mid-points at 60-5", 6T5", 
etc. This assumption makes no difference in the calculation of the 
mean, median, etc.; and if it is afterwards found that the measure¬ 
ments, in the above case, for example, were correct to the nearest 
£"—giving, limits 59-J-60-J" with mid-point 60$"—all that is 
necessary is to subtract in each case from the results obtained. 


Quintiles, Deciles, Percentiles 

A distribution may be divided in five equal parts by the four 
quintiles , into ten equal parts by the nine deciles , or into one 
hundred equal parts by the ninety-nine percentiles. The various 
quintile, decile or percentile values of the variable can be obtained 
by the same method as that employed above to obtain the median 
and quartile values. It is often advisable to convert the tabular 
frequencies into percentages. 


Use of Diagrams 




N?OF EARNERS GETTING LESS THAN X DOLLARS PER WEEK 


24 INTRODUCTION TO STATISTICAL CALCULATIONS 

Example XIII. 


Weekly wage 
in dollars (. x ). 

No. of wage- 
earners (/). 

Cumulative frequencies (F). 

30- 

2 

2 earn under 32 dollars 

32- 

9 

11 „ 

„ 34 „ 

34- 

25 

36 „ 

„ 36 „ 

36- 

30 

66 „ 

,, 38 

38- 

49 

115 „ 

„ 40 „ 

40- 

62 

177 „ 

„ 42 „ 

42- 

39 

216 „ 

f f 44 „ 

44- 

20 

236 „ 

,, 46 

46- 

11 

247 „ 

„ 48 

48-60 

3 

250 „ 

„ 50 


250 





WEEKLY WAGE (pOLLARS)(x) 

Fig 3 













MEASURES OF LOCATION 25 

Draw a cumulative frequency curve, or ogive, to illustrate the above 
data, and from it determMe"! " 11 

(а) The median and quartile wages. 

(б) The third quintile and the eighth decile wages. 

(c) the number of wage-earners receiving between $37 and $45 per 
week. 

For the method of obtaining such information, see the diagram. 

By Calculation : 

The 3rd quintile is the J(250)th item — 150th. 

• M p,n, _ •+ 

The 8th decile is the i~Q-(250)th item — 200th. 

.. 8th deale = 42 + i?^- 177 ! 2 „ 43-18. 

The median is 40*3 and the 3rd quartile is 42-5. 

From the Graph : 

Number earning less than $45 is 227. 

„ ,, „ $37 „ 50 

number earning between $37 and $45 is 177. 

It is often desirable to reduce the frequencies to a per cent, or 
per mille basis before drawing the ogive. If two distributions are 
to be compared then this reduction is, in general, essential. 

The frequencies in the above example have been cumulated on a 
f< less than ” basis. Obviously they could have been cumulated 
oh a “ more than ” basis, as follows: 

250 earn more than 30 dollars 


248 

32 

230 

34 

214 

36 

34 

44 

14 

„ 46 

3 

48 

0 

,, 50 


I If both sets of cumulated figures are plotted on the same diagram, 
and on the same scales, the resulting curves will intersect at the 
median. 

It is a matter of indifference by which of these methods the 
cumulative frequency curve is drawn. 

The median value of the variable, or 6ther partition value, 
obtained from a carefully drawn graph, which exhibits a definite 





26 INTRODUCTION TO STATISTICAL CALCULATIONS 


curvature, is more reliable than that obtained by calculation from 
the frequency table, since, in the calculation^ th$ crude assumption 
is made that the frequency lTurnTdfMy distributed^ in the fnatmer 
a^coTurn^ over the several intervals. 

IV. More Difficult Types of Frequency Distribution 

The calculation of the mean, median and other statistics, in the 
case of frequency distributions with “ open ends ", unequal intervals 
and large frequencies, has certain difficulties which are illustrated 
in the following examples: 


Example XIV. Distribution of coal mines according to number of 
wage-earners employed. 


Number of 






wage- 

Number 





earners 

of 





employed 

mines 

Mid-point 


(x - 374 5) 

f(x -374 5; 

w 

(/). 

of interval 

(*—374 5). 

125 

125 

Under 20 

408 

10 

- 364 5 

- 2 916 

- 1189 73 

20- 49 

201 

34-5 

- 340 

- 2-72 

- 546-72 

50- 99 

117 

74 5 

- 300 

- 2-40 

- 280 80 

100- 249 

148 

174-5 

- 200 

- 1 60 

- 236-80 

250- 499 

222 

374-5 

— 

- 

- 2254-05 

500 749 

180 

624-5 

+ 250 

4 ^ 

4 360 

750- 999 

126 

874-5 

■r 500 

4 4 

4 504 

1000-1499 

123 

1249 5 

4 875 

i- 7 

t 861 

1500-1999 

71 

1749-5 

4- 1375 

+ 11 

4 781 

2000-2499 

19 

2249 5 

4 1873 

4 15 

4- 285 

2500-2999 

15 

2749 5 

4- 2375 

4- 19 

4- 285 

3000 antj over 

4 

3499 5 

4- 3125 

f 25 

+ 100 


1634 




4- 3176 


[Source * Ministry of Fuel and Power] 

The range must first be assumed; in the present case the first interval 
has been taken as 1-19, and the last interval as 3000-3999. In the 
absence of special information common sense is the only guide. The 
absence of upper and lower limits does not affect the calculation of 
quantiles, and will not much affect the calculation of the mean unless 
the frequencies in the indeterminate intervals form a considerable 
percentage of the total frequency. 

o-r, e ,922 x 125 

Mean =* 374-5 4--— == 445 wage-earners. 

1 634 

The total number of wage-earners employed in the 1634 mines is 
stated in a Ministry of Fuel report as 714,750, i.e., an average of 437 
per mine. For a skew distribution of this type the agreement is close. 



WWW 


MEASURES OF LOCATION 



The median number of wage-earners determined from the table is 
(S17 - 726)150 


100 


148 


« 192. 


In a distribution of this type this great difference between mean and 
median is to be expected, and a study of the above calculations will 
make the reason for the difference clear. 


Histograms 

The construction of a histogram, or column diagram, in the case 
of a frequency distribution with equal intervals presents no diffi¬ 
culties. The histogram is always an area diagram, i.e., its area 
equals the total frequency to some scale, but all that is required 
here is to make the column heights proportional to the interval 
frequencies. 

When the intervals are not equal, the heights of the columns will 
obviously not be proportional to the frequencies, and the deter¬ 
mination of the heights demands a little thought. 

Suppose it is required to draw a histogram of the above distribu¬ 
tion and that the horizontal scale is to be 1" == 400 wage-earners, 
and the area scale 1 sq. in. — 400 coal mines. 


(1). 

(2) 

(3) 

Width of columns 

Area of columns 

Height of columns on 

on diagram 

on diagram 

diagram. 

i " 

2 0 

1-02 sq ms 

1-02 - = 20-4" 

3 0 " 

400 

0-5025 

0 5025 =- 6-7'" 

5 O " 

4 0 0 

0-2925 

0-2925 x 8 = 2-34" 

ISO" 

4 0 0 

0-37 

0 37 x ■ - 0-986 


and so on. 



The entries in cols. (1) and (2) are fixed by the scales selected and 

col. (3) is simply = height. 

The easiest way to find the relative heights of the columns is to 
divide each frequency by the corresponding interval, e.g., 


408 

20 

201 

30 


= 20-4 


= 6-7, etc. 


If the base scale is made any convenient size and the heights of 
the columns made proportional to 20-4, 6*7, etc., the areas of the 
columns will represent the frequencies to an area scale which can 
be easily determined. 


p 



28 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example XV. 

Retailers of Chocolate and Sugar Confectionery 



Analysis hy Si. 
Independent 

ze of Outlet at Nov 

1945. 


No. of 4 

grocers and 

(x - 

- 499-5) 

f{x - 499-5) 

weekly rations 

general food 

sold (x). 

shops (/). 

Weight (/). 

100' “ 

“100 

Under 50 

37,125 

3,712 

4-745 

- 17,613-4 

50- 99 

30,806 

3,081 

4-25 

- 13,094-2 

100- 199 

25,677 

2,568 

3-5 

-- 8,988 0 

200- 399 

11,455 

1,146 - 

2-0 

- 2,292-0 

400- 599 

2,355 

236 

- 

- 41,987-6 

600- 799 

780 

78 f 

2-0 

+ 156 

800- 999 

350 

35 -1 

40 

140 

1000-1499 

275 

28 -i 

7 5 

210 

1500-1999 

64 

6 

12*5 

75 

2000-2499 

26 

3 I 

3 7-5 

52-5 

2500-2999 

17 

2 4 

22-5 

45 

*3000 and over 

51 

5 

27-5 

137-5 

* Taken as 

108,981 

10,90) 


-{ 816 

3000-3499 

— 

_ 




[Annual Abstract of Statistics 1937-47 , Source 

41,172 X 100 
10,900 


Mean 499 5 


Ministry of Food] 
122 . 


The average is vStated in the Ministry of Food Report to be 118 four- 
weekly rations sold. The median number of four-weekly rations 


, (54 491 — 37125)50 
* ! 30806 ~ 


When the frequencies are large the work may be shortened by 
using as the multipliers of the deviations—i.c., as " weights ’'— 
numbers approximately proportional to the frequencies. Thus in 
the above example the frequencies have been written to the nearest 
ten. The use of approximate “ weights ” is discussed in Chapter II. 


V. The Mode 

The “ mode ” is the “ most fashionable ” value of the variable— 
i.e., the value of most frequent occurrence. In a frequency curve 
the modal value of the variable is the “ x” value for which the fre¬ 
quency is greatest, and it is usually possible to locate the mode 
roughly from a frequency table or diagram. It is, however, 
usually difficult to locate with any accuracy. If a distribution is 
symmetrical, mean, median and mode coincide. If the distribution 



wrrw, 


MEASURES OE LOCATION 29 

is somewhat “ skew ”—i.e., lopsided or unsymmetrical—the mean 
and median do not coincide with the mode or with each other. The 
median depends on the numbers in the group only, whereas the mean 
is affected by the extra “ weight ” of extreme items. Consequently 
in a skewed distribution the mean is farther away from the mode 
than is the median, as is illustrated below. 




1 Fig 4 

In a moderately skewed distribution the relationship 
I Mode - Mean — 3 (Mean — Median) 

f may be used to obtain the approximate value of the mode. 

In a cumulative frequency diagram the value of at which 
the slope of the curve is a maximum is the modal value. This is 
the point of inflection or “ reverse curvature” of the ogive. Since 
' it is, however, in general impossible to locate this point with any 
pretence to accuracy, this fact is of theoretical interest only. 

VI. The Geometric Mean 

The geometric mean of a series of numbers, x v x 2 . . . x n is the 
nth root of the continued product of the numbers._ 

For example, the geometric mean of 3, 4, 6, 9 is \ 4 /3 x 4 x 6 X 9, 
or log G.M. — £(log 3 + log 4 + log 6 + log 9) 

—i.e., the logarithm of the G.M. of a series of numbers is the 
arithmetic mean of their logarithms. 

If one of the values of the variable is zero, then the G.M. is zero 
also. 

The geometric mean of a series of different numbers is always 
less than the arithmetic mean of the series, and in consequence is 
often used when it is desirable that undue importance shall not be 
given to large values of the variable. The geometric mean is the 
average used m the compilation of some index numbers, and will 
be further considered in that connection. 






30 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example XVI. Find the G.M. of the numbers 11*5, 12-7, 14-6. 

G.M. = ^ff-5~x l2T~X 14 6 

= 12-87 log 11-5 - 1 0607 

log 12-7 = 11038 
log 14-6 « 1*1644 

3 3*3289 
1*1096 

Example XVII. Find the G.M. of the numbers 11*5, 12-7 and 14*6 
if these numbers are weighted by 3, 4 and 2, respectively. 

G.M. - \/(ll*5) 3 X (12-7) 4 x“(14~6)~ 2 
i.e., log G.M. = H 3 log 11-5 + 4 log 12*7 + 2 log 14*6) 

= 1*1029 

i.e., G.M. = 12*68. 

Example XVIII. The G.M. of six numbers is 75, and the G.M. of four 
of them is 67. Find the G.M. of the other two. 

G.M. required — V75 6 — 67 4 

i.e., log G.M. = i(6 log 75-4 log 67) 

*. G.M = 94*0 very nearly. 

Example XIX. A man gets three successive annual rises in salary of 
9%, 12% and 16%, respectively, the percentage being reckoned in 
each case on his salary at the end of the previous year. What average 
per cent, increase does this represent ? 

G.M. of 1-09, 112 and 116 =^H)9~ x HIT >T 116 

= 1123 

i.e., required percentage = 12*3. 

Geometrical Progression 

When the numbers in a series increase, or decrease, in a constant 
ratio, the series forms a geometrical progression, as, e.g., 

2, 4, 8, 16 . . . common ratio — 2 

i. b b i ■ • • .. = f 

Each term, excepting the first and last, is the geometric mean of 
its next-door neighbours. 

Thus 4 — V~2 X 8 

1 = vTxl- 

. The sum of a series in G.P. may be obtained from the formula 

when r > 1 

„ r < 1 

V 


or 


S - 


a(r n — 1) 
“r-1 
a(l - r*) 
1 - r 



MEASURES OF LOCATION 


31 


Thus the sum of 6 terms of the series 1, 1^ 2J . . , 


1-5 


L-5)» - 1} = 


20-7&125. 


The nt h term of a series is ar n ~ 1 . 

the 6th term of the above series — ] x (1*5) 6 

= 7'|f = 7-59375. 

The “ Compound Interest Law ” 

The amount of £P in n years at r% per annum, when the interest 
is added annually, is given by 



\ 

If interest were added monthly—12 times a year—we should have 



and, in general, if interest be added t times a yeai 

. A = p (‘ + ml" 


As t is increased, the period of time between successive payments 
gets shorter and shorter, and if t became indefinitely great, interest 
would be added continuously to the principal. 

It can be shown that in this case 


A - Ve m t 

where e = 2-71828. 

Let P = £100 • r = nominal rate per annum — 6% 

n — 1 year log e = 0*4342945 

Then A ~ 100^° 06 0*06 lo 8 e ~ 0*0260577 

log 100 « 2 

= 106-184 2*0260677 

Thus a nominal 6% paid continuously is equivalent to 6*184% 
paid annually. 

% 

VII. The Harmonic Mean 


The harmonic mean between two numbers is the reciprocal of the 
arithmetic mean of their reciprocals—e.g., the H.M. between 3 and 
6 is the reciprocal of £(-£• + £)—i.e., of 


H.M. - 4. 


In 


general, the H.M. between a and b = 


2 ab 
a + b 



32 INTRODUCTION TO STATISTICAL CALCULATIONS 


Certain problems involving “ rates ” may be so stated that the 
harmonic mean is the appropriate average to use, though such 
problems can always be restated so as to involve the arithmetic 
mean. 


Example XX. A man travels from A to B at 3 m.p.h. and returns 
at 6 m.p.h. What is his average rate in m.p.h. ? 

The answer is not —t-*? =. 4*5 m.p.h. It is obviously less than this, 
£ 

since the man walks for the longer time at the slower rate. 

Suppose the distance from A to B is 12 miles. Then time taken for double 

24 

journey = (4 + 2) hours and therefore rate in miles per hour — - T — 4. 

The H.M. of 3 and 6 = t—t—tt — 4. 

id "T‘ e) 

Example XXI. A man having to drive 90 miles wishes to achieve an 
average speed of 30 m.p.h. For the lirst half of the journey he averages 
only 20 m.p.h. What must be his average for the second half of the 
journey if his overall average is to be 30 m.p.h. ? 

Let his average speed for the second half be x m.p.h. 


Then 


Vi ■ V , 1 

2\20 T x) ' 30 


i.e., 


— 60 m.p.h. 


Example XXII. Apples are bought at 3 for Is., 4 for Is. and 6 for Is. 


1 

H.M. 


of 3, 4, 6 = - 


3 



J3 

24 


H.M. 


24 

6 


= 4 


average is 4 for Is.—i.e., 3^. each. 

3 for Is is equivalent to 4 d each 

4 ,, Is. ,, 3 d. 

6 ,, Is. „ 2d. „ 

and the A.M. of 4 d., 3d. and 2d. is 3^. 

Note that this solution implies that equal numbers of apples were 
bought at the different prices. 

The harmonic mean of a series of numbers is less than the geometric 
mean of the series. 

If A, G and H are the arithmetic, geometric and harmonic means, 
respectively, between x and y, it can easily be shown that 
A x H ~ G 2 
A : G - G : H 


or 



MEASURES OF LOCATION 


33 


i.e., G is the geometric mean hcteeen A and H. 

A.M. of x and y %(x + y) 

G. M. of x and y --- V xy 

H. M. of x and v 

* i X 

/ xv 

Then \{x Tv) Vxy - y xy • - - 

X —j V 

2\v 

since Mx + v) X - — iv. 

2 v - ' y fy 


The Geometric and Harmonic Means of a Grouped Distribution 
Example XXIII. Find the AM, G M and H.M. of the following 
distribution: 


w 


( t ) 

/(log a) 

/A* 

3 


2 

0 9542 

0-6667 

4 


5 

3 0105 

1-2500 

5 


9 

6 2910 

1 8000 

6 


14 

10 8948 

2 3333 

7 


15 

12 6765 

2 1428 

8 


8 

7-2248 

1-0000 

9 


6 

5 7252 

0 6667 

10 


3 

3 0000 

0 3000 

11 


1 

1*0414 

0 0909 



63 

50-8184 

10 2504 


AM. 

- 6 651 




G.M. 

— antilog 

50 8184 

6 406 


1 

n.M. 


10 2504 
65 


0 1627 


HM.--6 146. 


Exercise 1 

1. Find the mean and median of the f Mowing set of numbers. 10 7, 8*5, 6*2, 
9*3, 7*4, 5*7, 4*6, 6*5, 7 7. 

(l) Find, by grouping, the mean of the following numbers; 3, 4, 5, 5, 4, 
3, 6, 7, 8, 4, 5, 6, 8, 9, 9, 7, 5, 6, 6, 7. 

Using arbitrary origin A = 5, show that 

. £/(* - A) _ 

n 

(li) Find also the median. 

3. No. of children in family -.0 1 2 3 4 5 6 

No. of families: 15 30 25 19 8 2 1 £/*lQ0 

B 



34 INTRODUCTION TO STATISTICAL CALCULATIONS 


Find, for this group of 100 families : 

(a) The mean number of children per family * 

(b) ,, median ,, 

(cj ,, modal 

4. Length in cm . 2-0 2*5 3 0 3-5 4-0 4-5 5*0 

/. 5 38 65 92 70 40 10 2/ = 320 

The measurements represent the mid-points of £-cm. intervals : 

(a) Calculate the mean length. 

(b) Rewrite the table 

(*). (/)• 

1-75-2-25 5 

2 25-2-75 38 etc., 

and find the median length, by calculation. 

5. No. of persons per house 1 2 3 4 5 6 7 8 9 10 

No. of houses 26 113 120 95 60 42 21 14 5 4 = 500 

Calculate the mean, median and modal numbers of persons per house 

6. The average mark for arithmetic m a class of 30 was 52. The top six 
students had an average of 80, and the bottom ten an average of 31. What 
was the average mark of the other students ? 

7. A motor-car is driven for 4 miles at the rate of 20 m p h , and for a 
further 4 miles at a rate of 30 m p h What is the average rate ? 

8. A man gets a rise of 10% m salary at the end of his lirst year of service, 
and further rises of 20% and 25% at the end of the second and third years, 
respectively, the rise m each case being calculated on Ins salary at the beginning 
of the year To what annual percentage increase is this equivalent ? 

9 If the interest paid on each of three different sums of money, yielding 
3%, 4% and 5% simple interest per annum, respectively, is the same, what 
is the average yield per cent on the total sum invested ? 


10. 

Weekly 

No per 1000 



wage-rates. 

employees. 



5 

d. 




52 

6 

57 

Calculate the average wage- 


55 

7 

175 

rate, correct to the nearest 


61 

9 

352 

Id. 


62 

4 

240 



67 

8 

125 



73 

7 

51 





1000 


11.* x 

. 10- 

11- 

12- 13- 14- 

15- 16-17 

f- 

4 

11 

20 30 19 

10 6 2/ sss 100 

Fmd the mean 

, median and quartiles of the above distribution. 

12. * : 

10 - 

15- 

20- 25- 

30- 35- 40- 45-50 


/: 246 1164 1146 2225 1815 1262 723 299 2/^ 8880 

Find the mean, median and quartiles of the above distribution. 

13. Draw an accurate cumulative frequency graph of the following distribu- 

* See note on page 23, relating to limits of class-intervals and values of 
id-points of intervals. 



1 MEASURES OF LOCATION 3S 

Ion and wnte down the approximate numbers of widows of each year of age 
ctween 50 and 60. 

UJi. 1911 

Age Distribution of Widows 
Ages . Numbers per 1000. 


Under 25 

2 

25-35 

29 

35-45 

91 

45-55 

174 

55-65 

244 

65-75 

291 

75 and over 

169 


1000 

Total number of widows, 1911, = 1,751,000. 

(L U. Inter. Com.) 

14. Determine the median and quartile ages of widows from the graph of 
Question 13, and check your results by calculation 

15 The population of a country at a census m mid-1900 was P. In the 
ensus of mid-1910 it was 1-125P Find the annual rate of increase, assuming 
hat it was the same each year. Use this rate to find the population m mid- 
i 005 and show that it is (P X l*125P)i 

16 The arithmetic mean of two numbers is 10, their geometric mean is 8. 

a) Find the harmonic mean of the two numbers. (b) Find the two numbers. 

17 Distribution of Heights of 1080 Individuals 

leight in inches : 57-5- 60- 62-5- 65- 67-5- 70- 72-5- 

\f umber: 6 26 190 281 412 127 38 2/= 1080 

Compute the median and quartiles of the distribution, check the results on 
graph 

(L U B Sc Econ., 1941) 

18. Ages of Males and Females Marrying m 1937 

(In thousands) 


Age (years). 

Male 

Female. 

15-19 

11 

53 

20-24 

107 

140 

25-29 

138 

98 

30-34 

51 

32 

35-39 

18 

12 

40-44 

10 

9 

45-49 

6 

5 

50-54 

5 

3 

55-59 

5 

3 

60 and over 

6 

2 




36 


INTRODUCTION TO STATISTICAL CALCULATIONS 


19. A' . 15- 20- 25- 30- 35- 40- 45- 50- 55- 60- 65-70 

/: 3 10 22 35 24 17 8 6 4 — 2 2/ — 131 

Calculate the mean, median and quartiles of the above distribution. 

20 A given machine is assumed to depreciate 40% in value in the first 
year, 25% in the second year and 10% per annum for the next three years, 
each percentage being calculated on the diminishing value What is the 
average percentage depreciation, reckoned on the diminishing value, for the 
five years ? 

21 In 1921 the populations of four towns were 25,732, 29,508, 15,877 and 
40,250, respectively. In 1931 the first three had increased by 8%, 4% and 
10%, respectively, while the fourth had decreased by 7% Assuming the 
“ errors " m the data to be negligible, find the average percentage increase for 
the four towns taken together. 

22. In 1921 the populations of three towns, A, B, C, were m the ratio 
1 . 1-37 2 52 In 1931 A B C =.- 1 . 1-52 2*67 

If the population of A had increased by 5% between 1921 and 1931, find 
(a) the percentage increase m 15 and C for the ten years and ( b) the average 
percentage increase for the three towns, taken together, assuming " errors ” 
in the data to be negligible 

23 A firm buys /10,000 worth of plant on Jan. 1, 1939, and arranges for 
depreciation according to the following schedule • 


Category. 

Cost (£) 

Depreciation rate. 

General plant 

4000 

f>% on reducing balance 

Electrical plant 

1000 


Process plant 

3000 

10% 

Lorries 

2000 

20% 


(a) Find average depreciation rates for 1939 and 1940 

(b) State as concisely as possible the reason for the difference between the 
two averages. 

(Soc. I A A ) 

24. Plot (a) a histogram and (b) a cumulative frequency curve (ogive) from 
the following data 

Number of Dairy Farms in England and Wales, according to the Cost of 
Production of Milk, 1935-36. 

Cost of production in pence 

^ per gallon 4 -6 6-8 8-10 10-12 12-14 14-16 

Number of farms 13 113 182 105 19 7 ~ 437 

Find the approximate value of the median from the cumulative curve and 
mark that value on the histogram. 

(Soc. 1 A A ) 

25. Find also the values of the quartiles m the above distribution from the 
ogive. 

Calculate the mean cost of production from the frequency table. 



MEASURES OF LOCATION 


37 


26. Weekly earnings (dollars). No. of persons. 


25- 

25 

26- 

70 

27- 

210 

28- 

275 

20- 

430 

30- 

550 

31- 

340 

32- 

130 

33- 

00 

34- 

55 

35 36 

25 

2200 


( 1 ) What was the most usual wage ? 

(n) What was tlie mean wage? 

( 111 ) What was the median w r age ? 

(iv) Stale the wage limits for the central 50% of the wage-earners. 

(\) What percentage ot the woihcis engaged eained between $28*5 and 
30 5 > 

(vi) What percentage earned more than 831 50? 

(\n) What percentage earned less than ,*>27-50? 

27 

15- 20- 25- 30 35- 40- 45- 50- 55- 60- 65- 70-75 

2 5 8 11 15 20 20 17 J6 13 11 5 -^143 

Calculate the mean and median values of Calculate also the quartiles 
nd obtain an appiovimatc value of the mode. 

2S XYZ Company 

11 uMv LomnusMon paid to Salesmen 
\rnount of commission. No. of salesmen 


2 0 but 

mi 

deT 2,2-5 

4 

2 5 


3 0 

7 

3 0 


3 5 

1 1 

3 5 


4 0 

15 

4 0 


4 5 

\“) 

4 5 


5 0 

42 

5 0 


5 5 

53 

5 5 


6 0 

38 

6 0 


6 5 

24 

6 5 


7-0 

12 

7-0 


7 5 

5 

7*5 


8-0 

1 




237 


Calculate mean, median and quartiles. 

20 The purchase price of a National Savings Certificate is 105., and the 
^payment value, after 10 years, is /I 0s. 6 d. 





38 INTRODUCTION TO STATISTICAL CALCULATIONS 


Find— 

( 1 ) The percentage rate of simple interest per annum. 

(u) The percentage rate of compound interest per annum. 

(iii) The percentage rate, assuming that interest is added continuously 
at each moment of time (log 10 e - 0-43429). (L.U, Inter. Com.) 

30. Find the median and quartile ages of the 210 persons of Example VI, 
page 16. 

31. Draw a cumulative frequency curve of the distribution of Example VII, 
page 17, and from it find the median and quartile heights. Find also the 
percentage of persons with heights between 63-5" and 66-5". 

Hint . Assume that the intervals are 59-61, 61-63, etc , and rewrite the table 
2 persons are under 61" 

10 ,, ,, 63", etc. 

32 Calculate the mean weight of the 245 persons of Example XI, page 21. 

33. Calculate the mean wage of the 250 wage-earners of Example XIII, 
page 24. 

34. Population of England and Wales—Males 
Age Distribution—Actual and Estimated 

(TOO) 


Age group. 

1881. 

1951. 

Under 15 

4,740 

3,785 

15 and under 30 

3,380 

4,150 

30 „ 45 

2,250 

4,639 

45 „ 60 

1,430 

3,722 

60 „ 75 

710 

2,232 

75 and over 

110 

534 


12,620 

19,062 


Draw ogives of the above age distributions and from them determine— 

1. The median and quartile ages in 1881 and m 1951. 

2. The percentage of the population under 21 years of age at both dates. 

3. The percentage of the population between 25 and 40 years of age 
at both dates. 

35 Rise in Price of 300 Commodities between Two Dates 


% increase 

Frequency 

0- 

12 

5- 

30 

10- 

51 

15- 

84 

25- 

66 

35- 

35 

45- 

15 

60 and under 80 

_7 


300 


(l) Calculate the mean rise in price. 

(li) Find from an ogive the median and quartile rises in price, and the 
number of commodities which rose by 20 and less than 40%. 

(iii) Draw a histogram to illustrate the table. Hor. scale 1" = 10% 
increase; area scale 1 sq. in. ~ 20 commodities. 



X ID ban District 

Private Families Classified bv Size of Family Rooms Occupied and Density of Occupation 
[Census of 1931 data adjusted to simplify calculations' 1 


MEASURES OF LOCATION 


0© 


d 

o 

5 

d 

Oh c 
O t 
0h4 


o J2 


* <ri o ' 


& 


Ph 


'O 

d © 
d ^ 
JM ° 

12 *-• 

c <D 

cd r' Ci 
« ° 


6.1 


-* 1 o 


£ 3 -C 
c o 0> 


sgCiiS! 

3h ^ S ^ g 


ni on — 

IS r* -*-> d w 
o fe «J S Si 

[- . M-t C3 ZZ 


fcuO 

C 


/5 


o c 
o o 
o 

V) 

<U o 


£ c 


c 


r-H ^ , 


jr, ^ cO rt r>| 


Ci o ^ © l"! CO . 


: © ^ © r W r-. cr 

CO Tf CO (M IH 


o § B 
*z u a 

v* 


cSr-<©icOTt<»r'cot'-oc 


families, etc j 3 340 lo3 140 47 27 7 717 








INTRODUCTION TO STATISTICAL CALCULATIONS 


lo 

The above table shows two distributions : 

( 1 ) No. of rooms occupied (#) and no of families occupying such nos. 
of rooms (/*). 

(ii) No. of persons in families (y) and no. of families with such nos. in 
family (/„). 

The numbers m the body of the table are numbers of families occupying 
x rooms and having y persons in family— 

e.g. in row 1. 28 families with 1 person m family occupy 2 rooms each 

,, 2. 34 ,, 2 persons ,, ,, 3 

And so on 

Calculate— 

(a) The aveiage (mean) number of peisons per family 

(b) The mean number of rooms occupied per family. 

(r) Complete columns (j) to (p) 

37 Put the following information in the form of a frequency distribution 
and make an estimate of the mean wage 

In a certain group of wage-earners the median and quartile wages were 
375., 29s. 6d and 405. 6d per week respectively 6% of the workers got less 
than 20s. per week, while 3% got 45s. and over per week. 

Hint —First put the information in the form of a cumulative frequency 
tab% 

38 Calculate the arithmetic, geometric and harmonic means of the 
following frequency distribution 

# 5 6 7 8 9 10 11 

/ 2 4 7 10 9 6 2 T/ ~ 40 

39. Estimate from the following table the average thickness of seam for 
each district. 


Thickness of Seams in Coal-mines, 1944 


District. 

1 Under 
! 2'. 

i 

2' and 
under i 
3'. 

1 

3' and j 
under 
4'. 

4 ' and 
under 
5'. 

5' and 
under 
O'. 

6' and 
over. 

! 

Total. 

j 

Percentage of total output. 

: 

Northumberland 

3-9 

43-5 

33-3 

13*4 

4*3 j 

1-6 

100% 

Derbyshire, South 

— 

0-6 

5-2 

35*8 

22-5 j 

35-9 

100% 

Yorkshire, South 

; °- 8 

19*3 

26-1 

31-0 

15-0 

7-8 

100% 


[Ministry of Fuel Power, Statistical Digest , 1945] (Inst. T., 1948) 


With limits taken as V and 8 / the results for Northumberland and Derby¬ 
shire are qpid JF56'. The lower limit does not affect the results much, 



MEASURES OF LOCATION 


41 


bat the upper limit will obviously affect the result for Derbyshire considerably, 
bince more than a third of the coal comes from scams of 6' and over. If the 
upper limit is taken at 9', the width of seam for Derbyshire becomes 5-74C 

40. Verify the following relations : 

(i) Sfl/r = aZfx where a is a constant. 

(ff) S f{x -j~ a) = £/# -j- N a, where N 
(m) Tif(x -f a) 2 = i Ifx 2 + 2 a'Lfx -f- Na 2 . 

41. (i) If there are N pairs of the variables x and y, e.g., x 1 and y v x z and ;j 
v a , and we write : 

Z 1 = *, -I- V, 

z 2 — x 2 + y t , etc. 

Show that Z - T -f V 

(n) If a and b are constants and Z x — x -f etc , show that Z = 
ax -f by. 


C 



42. Let the triangle ABC represent a frequency distribution. Values of x 
are measured along AB, from x — 0 to x — 6. \C — 4. 

Find (1) a value of say x v such that the ordinate at x y passes through 
the centre of area—centroid—of the figure; and (2) a value x 2 , such that the 
ordinate at x 2 bisects the figure in area. 

What do the values x x and x 2 represent ? 

Hint .—(1) The medians of the triangle intersect at the centroid. 

(2) This is merely bisecting the triangle m are? by a line parallel to AC. 


43. 





12 INTRODUCTION TO STATISTICAL CALCULATIONS 


The isosceles triangle represents a frequency distribution where x takes all 
values from — a to -f* a. x = 0. 

Show that the quartiles are — (a — a/V 2) and (a — ajVZ), i.e. Q x = 

■ 0*293 a, and Q s = + 0*293 a. 

44. The following data from the monthly Digest of Statistics relate to coal 
reduction in Great Britain, all figures being weekly averages : 



1941. 

1942 

1943. 

1944. 

1945. 

1946. 

1947. 

Production, mined coal 
('000 tons) 

3957 

3905 

3730 

3521 

3350 

3476 

3578 

Wage-earners on col¬ 
liery books ('000) 

697 

708 

706 

708 

707 

697 

712 

Average number of 
shifts worked (per 
wage-earner per 

week) 

5*25 

5*22 

5-12 

4*96 

4*73 

4-84 

4*68 


Calculate the number of man-shifts worked and the average output per 
lau-shift in each year. 

Comment on the development of coal production in the period 

(L U B.Sc Econ., 1948) 


1941. (i) Using the figures as stated 

No. of man-shifts worked — 5*25 x 697 x 52 X 10 3 

= 190,281,000 

Average output per man-sinft 3957 — (5*25 X 697) 

— 1*08 tons. 


(n) Noting that the number of wage-earners is stated to the nearest 
tousand and the average number of shifts per wage-earner per week to the 
earest second place of decimals, it is clear that any number of manshifts 
worked between (5*255 x 697,500 X 52) = 190,598,850 and (5*245 x 
96,500 X 52) = 189,963,410 would be consistent with the data; i.e., the 
Dssible range of the answer is (190,281 ± 318) 10 s . For a discussion of 
itimates and “ possible errors ” see Chapter IV. 

The answers given were calculated from the data as given The reader, as 
a exercise, should work out what possible percentage errors are involved in 
us procedure 

45. The following table shows the number of adults and number of children 
t each of 300 families and the number of rooms occupied by them. 

Tabulate these so as to show the number of families consisting of 1, 2, 3 . . . 
2 rsons (adults and children) in relation to the number of rooms occupied, 
orking out average for families of each size. 

Also make a table of which the horizontal headings are the number of 








MEASURES OF 

LOCATION 



43 

children and the vertical list the number of adults. Compute the number of 

children per 100 families which contain 1, 

2, 

3 . 

. . adults respectively. 


d 



G 



d 



d 


t n 

4 -> 

a> 

u 

(A 

e 

0 ) 

8 

2 

w 

a 

03 

4-> 

£ 

TJ 

03 

B 

03 

<D 

U 

12 

03 

a 

13 

5 

0 

0 

*3 

jG 

o 

o 


2 

0 

o 

13 

*J3 

8 

< 

V 


< 

u 


< 

u 


< 

u 

« 

1 

0 

2 

2 

0 

1 

3 

0 

1 

5 

0 

4 

1 

0 

2 

2 

0 

2 

3 

0 

2 

5 

0 

4 

1 

0 

2 

2 

0 

3 

3 

0 

3 

2 

3 

1 

1 

0 

2 

2 

0 

4 

3 

0 

4 

2 

3 

2 

1 

0 

1 

2 

0 

5 

3 

0 

2 

2 

3 

3 

1 

0 

1 

2 

0 

1 

2 

2 

1 

2 

3 

4 

1 

0 

1 

2 

0 

2 

2 

o 

2 

2 

3 

2 

1 

0 

1 

2 

0 

3 

2 

2 

3 

4 

1 

3 

1 

0 

1 

2 

0 

4 

2 

2 

4 

5 

0 

4 

1 

0 

1 

2 

0 

1 

2 

2 

1 

7 

0 

4 

1 

0 

3 

o 

0 

o 

2 

2 

2 

5 

1 

1 

1 

0 

1 

2 

0 

3 

2 

2 

3 

5 

1 

2 

1 

1 

2 

2 

0 

1 

2 

2 

2 

6 

1 

2 

2 

1 

1 

2 

0 

2 

2 

2 

4 

7 

1 

5 

2 

1 

1 

2 

0 

3 

2 

2 

2 

4 

2 

3 

2 

1 

2 

2 

0 

2 

4 

0 

2 

4 

2 

4 

2 

1 

2 

2 

0 

o 

4 

0 

3 

5 

2 

3 

2 

1 

2 

2 

0 

1 

4 

0 

4 

6 

2 

5 

2 

1 

2 

2 

0 

3 

4 

0 

5 

3 

3 

1 

2 

1 

3 

3 

1 

1 

4 

0 

3 

2 

4 

2 

o 

1 

3 

3 

1 

2 

4 

1 

1 

2 

4 

3 

2 

1 

4 

3 

1 

3 

4 

1 

2 

2 

4 

4 

2 

1 

3 

3 

0 

1 

4 

1 

3 

4 

3 

2 

o 

1 

4 

3 

0 

2 

3 

2 

3 

2 

5 

2 

1 

2 

2 

3 

0 

3 

5 

0 

1 

3 

5 

3 


(L U. B Com.) 


46. The average of ten different experiments gave the weight of a body as 
^5-686 gms. 

The average of the first four experiments was 26-680 gms and that of the 
ast three 25-686 gras. 

If the average of the fifth and sixth experiments was 0-042 gms. greater 
han the result of the seventh experiment, what was the result of the seventh ? 

47. British Railway Statistics, 1948 and 1947. Total—All Freight 

1948. 1947. 


Freight tram takings (^000) 

181,983 

156,841 

Net ton-miles ('000) . 

21,456,842 

20,863,527 ' 

Tons forwarded (’000) 

276,117 

258,086 

Loaded wagon-miles (’000) . 

3,311,498 

3,251,599 

Loaded train-miles (’000) 

118,396 

113,176 

Miles of road 

19,631 

19,639 


[Ministry of Transport.] 





44 INTRODUCTION TO STATISTICAL CALCULATIONS 


Compute ; 

(]) The average takings m each year (a) per net ton-mile and (b) per 
loaded wagon-mile, m pence correct to two places of decimals. 

(li) The average wagon load in tons, correct to two places of decimals, 
(in) The average length of haul, correct to the nearest one-tenth of a 
mile. 

48. British Coal Mines 

Thickness of seam m feet 



Under 

2 and 

3 and 

4 and 

5 and 

6 and 

Kegion. 

-> 

under 3. 

under 4 

under 5. 

under 6 

over 



Machine Cut as Percentage of Total Output. 


Northern A . 

99 8 

97 5 

95 9 

95*1 

76 4 

99 9 

B . 

81*6 

83-5 

88 0 

75 8 

81*4 

63*8 

North Eastern 

91-3 

95*6 

85-3 

69*4 

40*4 

43*8 

Wales . 

360 

58*3 

59*9 

52*3 

39 4 

27*4 

Great Britain . 

84 8 

89 2 

88*1 

77*3 

67 3 

54*4 


Percentage of Total Output from Each Thickness of Scam 

Northern A 

4 3 

42*7 

32 0 

14 9 

4 2 

1 9 

B . 

10*7 

28 0 

26*4 

18*2 

140 

2 7 

North Eastern 

2 0 

22 7 

30 7 

28 2 

10 8 

5 6 

Wales . 

2 5 

22*6 

24 1 

21 7 

15*4 

13 7 

Great Britain . 

3 7 

22*0 

29 3 

24*2 

12 7 

8 1 


[Source . Supplement to Statistical Digest, 1945 Ministry of Fuel and Power] 
Compute for each region and for Great Britain (i) the average thickness of 
seam, and (n) the percentage of total output which is machine cut 


49. Maths e/268 Candidates in an Examination 



0 . 

1 

‘> 

3. , 

4. 


0 . 

7 

8 

1 

1 9. 

i 

Total 

(j) 

0- 

10- 

- 

1 

— 

1 

- 


1 

2 

1 

-- 


6 

20- 

3 

1 

1 1 

1 

o 

1 

1 

1 

1 

2 

14 

30- 

o 

4 

1 4 

5 | 

3 

6 

5 

5 

10 

1 4 

48 

40- 

4 

9 

i 8 

6 

5 

11 

6 

6 

8 

j 8 

71 

50- 

3 ! 

3 

6 

7 1 

4 

6 

5 

11 

2 

! 6 

53 

60- 

7 

4 

3 

8 i 

7 

5 

4 

1 

2 

! 4 

45 

70- 

3 

2 

! ' 

1 

1 

5 

2 

2 

1 1 

1 2 

22 

80- 1 

— 

— 

i 



3 

1 

1 



6 

90- 

1 

— 

- 

~~ 1 

2 

— 

— 

— 


— 

3 


24 

23 

1 ■ 27 | 

1 

28 1 

1 

24 

38 

26 

28 

24 

26 

1 

268 


(i) Find the mean, median and quartiles of the above distribution, using 
the individual scores as given in the body of the table—e.g., 1 candidate has 
10 marks, 1 has 12, 1 has 15, 2 have 16 and 1 has 17, etc. 






MEASURES OF LOCATION 45 

(ii) f"md the mean, median and quartiles of the distribution when condensed 
into the two end columns. 


50. 

Population 
in Oct, 

Persons per 
dwelling 

% of occu¬ 
pied dwell¬ 
ing units 
with more 
than 1 
person per 

% of occu¬ 
pied dwell¬ 
ing units 
with more 
than 2 
persons per 

Towns. 

1942. 

unit, 1942 

room, 1942 

room, 1942, 

Luton 

100,480 

3*58 

19-82 

1-20 

Romford 

60,487 

3-59 

47 47 

2-86 

Watford 

68,789 

3 71 

20 69 

1-29 

Northampton 

100,502 

3 46 

14-31 

1-05 

Enfield . 

95,579 

3 38 

13-78 

0-91 

Harrow . 

193,745 

3 32 

13-80 

0-77 


[Source : Impact of the War on Civilian Consumption. H M S O , 19451 


Calculate for the above group of towns the average percentage of occupied 
dwellings with (a) more than 1 person per room, (b) more than 2 persons per 
room. 

51. A liquid is sold m bottles of a nominal capacity of 1 litre A sample of 
200 bottles is selected at random and the capacity of each determined. The 
following tabic shows the resulting grouped frequency distribution of the 
capacity of the bottles 


Capacity (c c ) 

No. of bottles. 

995-0- 995-5 

49 

995-5- 996-0 

16 

996-0 - 996 5 

15 

996 5- 997-0 

14 

997-0- 997 5 

10 

997 5- 998-5 

12 

998-5- 999-5 

14 

999*5-1001-0 

13 

1001-0-1002 5 

11 

1002-5-1005-0 

14 

1005-0-1010-0 

18 

1010-0-1030-0 

1 4 


200 


(l) Represent these data in the form of a histogram. 

(n) Calculate the arithmetic mean of the capacity of the distribution. 

(m) What are the relative advantages and disadvantages of the arithmetic 
mean, the ^edian and the mode of this distribution as measures of “ average 

apacity 5 

(R.S.S. Certificate, 1951) 



CHAPTER II 


WEIGHTED AVERAGES. RATES. 

USE OF AVERAGES IN CONNECTION 
WITH THE ANALYSIS OF TIME SERIES 

I. Weighted Averages 

“ Weighted average ” is the name given to an average when the 
various values of the variable have different frequencies; or when 
he several items, of which it is desired to find an average, are given 
mltipliers which express, more or less adequately, their relative 
nportance in some connection—for example, in the compilation of 
idex-mimfeers. 

If x v #J||. . x n be values of the variable with weights w v 

> 2 ,!>*$^J/respectively, 

S '* . , , , %wx 

then the weighted mean = 

r example, 10 lb. of tea are bought at 2s. 6d. per lb., 15 lb. at 
s. per lb. and 5 lb. at 4s. per lb., the weighted average price is 

(10 X 2j) + (15 X 3) + (5 X 4) shillin gs 
10 +* 15 + 5 

i.e., §§s. or 3s. per lb., which result is simply total cost divided 
y total number of units purchased. 

Example X, Chapter I, is a weighted arithmetic mean, where the 
eights are 100, 25, 65 and 10—i.e., the numbers in the whole group 
id in the three sub-groups. 

Example XVII, Chapter I, illustrates a weighted geometric 
»ean, and Example XXIII a weighted geometric and a weighted 
mmonic mean. The mean of any frequency distribution is a 
eighted mean. 

In many calculations it is permissible to use, instead of ftie actual 
equencies, values or quantities which are the indicated weights, 
imbers which are roughly proportional to the latter. This 
nplification of the weights, if carried out with common sense, leads 
no sensible loss of accuracy. Naturally the order of magnitude 
the weights should be preserved. 

46 



WEIGHTED AVERAGES 47 

Suppose the series to be averaged is x lt x 2 . . . x n with weights 
w v w z , . . . w n , then 

(i) If large, medium and small values of the variable have 
their weights in that order of magnitude it is obvious that 

will be that is, the weighted average will be 

greater than the " unweighted ” average. 

(ii) If the sizes of the variable and of the weights are in 
reverse order of magnitude—i.e., the bigger values with the 
smaller weights, etc.—then 


liWX 


will be 



(iii) If neither of the above relations between x and w 
obtains, then the weighted average may be greater than, equal 
to or less than the simple average. 

Example I. * 



Acreage of district 

Average yield per annum 
m bushels per acre 4 

(a) 

4,601,524 

34-7 

(i b) 

2,891,789 

31-5 

(«> 

1,859,862 

28 8 

(<*) 

940,233 

40-5 


Find the average yield per acre for the four districts combined. 

The reader should check the following results. 

( I ) Average yield using the given acreages as weights 

= 33*3 bushels per acre 

( II ) Average yield using weights 40, 29, 19, 9 

™ 33-2 bushels per acre 
(iii) Average yield using weights 5, 3, 2, 1 ~ 33-3 bushels per acre 

In the above example note that the separate yields are therm- 
elves average^ and that the " hundreds ” figures in the acreages' 
an have little significance. 

Averaging averages is a proceeding which presents many problems, 
nd the results are frequently somewhat surprising at first sight. 

Example II. "On a certain railway with five divisions the cost per 
on-mile in each of the five divisions was less in July than it had been 
i the previous June, and yet the cost per ton-mile was greater over the 



48 INTRODUCTION TO STATISTICAL CALCULATIONS 


system as a whole in July than it had been in June." Show how this 
is possible. 

June. July. 



Ton-miles 

Cost per ton- 

Ton-miles 

Cost per ton- 

Division. 

(’000) 

mile (pence). 

(’000) 

mile (pence). 

A 

120 

6 

240 

51 

B 

210 

5 

400 

41 

C 

350 

4 

200 

H 

D 

400 

3} 

300 

31 

E 

500 

3 

300 

2J 


Cost per ton-mile in June = 3‘84 d 



,, 

July - 3-96^ 


The above 

figures are 

quite fictitious. They 

do, however, 

illustrate what may happen in practice if a considerable alteration 


takes place between two dates in the nature of the freight traffic 
in the various divisions. 

Example III. A and B were respectively leaders and runners-up in 
a football league in a certain year. Here are their goal averages. 

Seasonal averages 

Home Away (home and away). 

A 56-17 av 3-29 30-26 av. 1-15 86-43 av 2 0 

B 64 18 „ 3 55 54-43 „ 1-26 118 61 „ 1-93 

B therefore had a better goal average both for home and for away 
matches, but A had a better average for both combined. 

We cannot deduce a seasonal average from a knowledge merely of the 
averages of home and away matches. The actual number of goals 
scored, it can easily be seen, affects the combined result. 

Suppose a team has a home average of 2-0 and an away average of 
T5. Then their seasonal average must lie somewhere between 2-0 and 
1-5. If more goals are scored at home than away, their average will be 
nearer to 2-0 than to 1-5, and vice versa. 

II. Rates 

The crude death-rate for a given city or district is 
Number of de aths X 1000 
Population 

As between one city and another, or one district and another, this 
affords no real comparison, as the proportions of the different age- 
groups in the local populations may differ very considerably ; and 
different age-groups exhibit, naturally, very different mortality 
rates. The crude death-rates are therefore standardized by applying 
specific rates for each age-group in each district to the age-groups 
of a “ standard population 



RATES, , 49 


Example IV. Find the crude and standardized death-rates of 
Districts A and B from the following data. 


Age 

range. 

District A. 

District B. 

Popula¬ 

tion. 

No. of 
deaths. 

Specific 

rate. 

Popula¬ 

tion. 

No. of 
deaths. 

Specific 

rate. 

0-10 

2,000 

50 

25 

1,000 

20 

20 

10-55 , 

7,000 

75 

m 

3,000 

30 

10 

55 and over 

1,000 

25 

or. 

1 

2,000 

40 

20 


10,000 

150 

1 

i 

6,000 

90 



^ . , , 150 / 1000 1r 

District A : Crude rate — - - ,, =15. 

10,000 

90 X 1000 lf , 

" li: " ” “ 6000 “ 15 ' 


Suppose the “ standard population ” which is used as a basis of 
comparison has the following age distribution per thousand: 

Age distribution. Numbers 

0-10 216 

10-55 583 

55 and over 201 


Standardized Rates 


Standard 

District A 
Specific 

Deaths 

Standard 

District B 
Specific 

Deaths 

population 

rates. 

per 1000 

population 

rates. 

per 1000, 

216 

25 

5*40 

216 

20 

4*32 

583 

10-7 

6-24 

583 

10 

5-83 

201 

25 

502 

201 

20 

4-02 

1000 


16 66 

1000 


1417 


Standardized rate for A -- 16 7. 
„ „ B = 14*2. 


In district A only one-tenth of the population is 55 and over, while 
in district B the proportion is one-third. 

In practice male and female rates should be computed separately, 
since mortality rates at all ages differ considerably as between males 
and females. f 

. Annual death-rates for the whole of a country—e.g., England and 
Wales—are^nade comparable by applying the specific age-group rates 
in each year to the “ standard million ” of 1901 or some other similar 
standard. 

Standardized death-rates of England and Wales until 1940 are stated 






50 INTRODUCTION TO STATISTICAL CALCULATIONS 


as if the age and sex distribution of the population had been the same 
as in 1901. From 1941 onwards they are calculated on the basis of the 
age and sex constitution of the population of Great Britain at the year 
of observation. 

Example V. 

Unemployment Percentage 
Age distribution per 1000 of 

* population. Percentage unemployed. 


Age 

District 

District 

Whole 

District 

District 

Whole 

group 

A. 

B. 

country. 

A. 

B. 

country. 

15-29 

300 

200 

240 

3-5 

3 

4 

30-44 

320 

350 

360 

9 

8 

7-5 

45-59 

325 

300 

290 

11-5 

15 

12-5 

60 and over 

55 

150 

110 

22 

25 

18 


1000 

1000 

1000 





Calculate the unemployment rate: 

(1) for the country as a whole; 

(2) for districts A and B separately—crude rate, 


(3) for distncts A and B separately—standardized 

rate. 


(1) Numbers 

0/ 

/o- 


(2) A 

0/ 

/o- 

B. 

0/ 

/o- 


240 

4 

9-6 

300 

3-5 10-5 

200 

3 

6 

360 

7-5 

27-0 

320 

9 28-8 

350 

8 

28 

290 

12 5 

36-25 

325 

11-5 37-37 

300 

15 

45 

110 

18 

19-8 

55 

22 12*1 

150 

25 

37-5 



92-65 


88-77 



116-5 

92-65 

per 1000 

88-77 per 1000 

116-5 per 1000 

(3) 


A 


B 





240 

3-5 

8-4 

3 

7-2 




360 

9 

32-4 

8 

28-8 




290 

11-5 

33-35 

15 

43-5 




110 

22 

24 2 

25 

27-5 




98-35 107-0 

98-35 per 1000 107-0 per 1000 

A rate or ratio, such as the foregoing, is simply an average—i.e., 
average number of deaths per 1000 of the population—but before 
such an average can convey any useful or intelligible information 
it is necessary to ensure that " all the members of the numerator 
haye some common characteristic and that all the members of the 
dendfninator have another common characteristic, and that these 
characteristics have some relationship with one another 
A standardized death, or other, rate complies with the above 
requirements and affords a method of making a useful comparison 



RATES 


51 

between groups heterogeneous with respect to age and sex distri¬ 
bution : a crude rate merely gives the information, if the population 
is known, that so many people have died, or been unemployed, 
within a given period. 

The crude birth-rate is variously given as: 

(1) the total births per thousand of the population; 

(2) the total births per thousand women aged 15-44; 

(3) legitimate births per thousand married women aged 
16-44. 

The first obviously violates every canon, though it may have its 
uses, as may similar statements such as " consumption of whisky 
per head of population ". 

The second eliminates the worst feature of the first, since at least 
“ any member of the denominator is capable of affording an instance 
of the numerator ", but before two such rates can be compared, the 
specific age-group rates must be standardized on some common age 
distribution of women between the ages of 15 and 44. 

The upper limit, 44 years of age, is arbitrarily chosen; 49 years 
of age is sometimes adopted; but in either case the number of births, 
when the mother is over these ages, is a very small proportion of the 
whole. 

In estimating population trends much importance has recently 
been attributed to “ gross ” and “ net ” reproduction rates. 
The meaning of these terms is illustrated in the following example. 

Gross and Net Reproduction Rates 

Example VI. Complete columns (4) and (6) in the following table, 
and hence obtain values of the gross and net reproduction rates: 


Number of Women m Age Groups and Number of Female Children 
Born tn One Year 


(1). 

(2). 

(3). 

(4). 

(5). 

(6). 


Female 


Female re¬ 


Female 

Age 

population 

Female 

productive 

Survival 

offspring of 

range 

('000) 

births 

rate 

rate 

survivors. 

15-19 

1558 

18,900 


0-914 


20-24 

1112 

71,100 


0-899 


25-29 

1595 

96,900 


0-884 


30-34 

1629 

64,200 


0-868 


35-39 

1627 

34,900 


0-852 


40-44 

1522 

10,800 


0-834 


45-49 

1401 

800 


0-813 



(L.TJ. B.Sc. Econ , 1946. Data adapted) 



52 INTRODUCTION TO STATISTICAL CALCULATIONS 

Col, (4) is obtained by dividing items in col, (3) by corresponding 
items in col. (2). 

The gross reproduction rate is the sum of col. (4) multiplied by 5— 
i.e., by the number of years in the age range in col. (1). This gross 
reproduction rate is the average number of girls borne by one married 
woman throughout her reproductive life, which is here taken to be 
between 15 and 50 years. In the above example the G.R.R. — 
0-205 X 5 = 1-025. 

The average number of potential mothers who survive is called the 
net reproduction rate. 

Column (6) is obtained by multiplying items in col. (4) by items in 
col. (5). The net reproduction rate is then the sum of col. (6) multiplied 
by 5. In the above example N.R.R. — 0-181 x 5 = 0-905. 

If the net reproduction rate is below 1 and remains below 1, so that 
each mother produces less than one mother per generation, the popula¬ 
tion in question is headed for extinction no matter what the current 
birth and death rates may be. 


Col. (4). 

Col. (6) 

0-012 

0-011 

0-064 

0-057 

0-061 

0 054 

0039 

0-034 

0-021 

0-018 

0-007 

0-006 

0-001 

0-001 

0 205 

0-181 


Estimates of the future population of the United Kingdom made 
during the 1930s were, it appears, unduly pessimistic, since they 
were based on the assumed continuance of the then very low net 
reproductive rates which obtained in the United Kingdom, and 
indeed in all the countries of Western Europe and in the United 
States. The reader should consult the recently published Report 
of the Royal Commission on Population for later views on this 
subject. 

III. Time Series 

The natures of (a) a cumulated series, (b) a progressive average, 

( c ) a moving total, ( d) a moving average, are shown in the following 
table. The period in col. (1) may be any division of time, days,’ 
weeks, months, years, etc., and the variable, col. (2), may be, e.g., 
the periodic production, or sales, of any commodity. 

The moving total and the moving average are centred at the middle 
of the period to which they refer; i.e., for a three-period moving 



TIME SERIES 


bo 

g 

*> 

o c 

B * 

1 ! 
o c 
O, 


§ 5 

■*-> s 

^ v *-i 

3 o 
8 
3 


cd rO H 


cu 


+ f 

H* k 

h + 


^ k 

"-ft 

IN « if 

* k k 

I + r 


* 

t- 


* ^ 

4 fi 


* * 

4 4-*4 


t 4 


I 4 

n N 


}- 

41 


-I 4* t 

94 (N <N 

H ^ * 

H 4 ~* 


H H 
n >-* 







54 INTRODUCTION TO STATISTICAL CALCULATIONS 

total or moving average the first entry in col. (5) or col. (6) is 
opposite x % ; and for a five-period moving total or moving average 
opposite x z . For a four-period moving total or moving average the 
first entry in col. (5) or col. (6) must be placed between x 2 and x 3 , 
the second entry between x 3 and x i , and so on. 

Example VII. 

Monthly Production 1938 


(’000 tons) 


Month : 

Jan 

Feb 

Mar. 

Apr. 

May 

June 

'000 tons produced : 

150 

200 

206 

194 

310 

320 

Month : 

July 

Aug 

Sept. 

Oct 

Nov 

Dec. 

'000 tons produced : 

250 

260 

280 

320 

300 

250 


Make a table showing: 

(a) the monthly figures; 

(b) the cumulative production ; 

(c) the progressive average production; 

(d) the three-monthly moving total; 

(< e ) the three-monthly moving average , 
(/) the four-monthly moving average. 




Cumu¬ 

Progressive 

3- 

3- 

4- 


Monthly 

lative 

average 

monthlv 

monthly 

monthly 


produc¬ 

produc¬ 

produc¬ 

MT. " 

M A 

M A 


tion (’000 

tion (’000 

tion (’000 

(’000 

(’000 

(’000 

Month. 

tons). 

tons). 

tons) 

tons) 

tons) 

tons). 

Jan. 

150 

150 

150 

— 

— 

— 

Feb. 

200 

350 

175 

556 

185 3 

187 5 

Mar. 

206 

556 

185^ 

600 

200 

227-5 

Apr. 

194 

750 

187-5 

710 

236-7 

257 5 

May 

310 

1060 

212 

824 

274-7 

268-5 

June 

320 

1380 

230 

880 

293-3 

285-0 

July 

250 

1630 

232f 

830 

276-7 

277-5 

Aug. 

260 

1890 

236i 

790 

263 3 

277-5 

Sept. 

280 

2170 

241L 

860 

286-7 

290-0 

Oct. 

320 

2490 

249 

900 

300 

287-5 

Nov. 

300 

2790 

253jL 

870 

290 

— 

Dec. 

250 

3040 

3040 

253f 

— 

— 

— 



Moving Averages 

Example VIII. 


TIME SERIES 


55 


Percentage Unemployed Among Insured persons. Average for 
the Year (Great Britain) 


Year. 

Males. 

Females. 

Year 

Males 

Females. 

1922 

161 

8*7 

1930 

16*4 

14*4 

1923 

12*4 

9-0 

1931 

22*4 

17*7 

1924 

10-8 

8*5 

1932 

25*1 

13*5 

1925 

12*0 

8*1 

1933 

23*0 

11*2 

1926 

13-2 

9*5 * 

1934 

19*1 

9*8 

1927 

10 9 

6*2 

1935 

17*8 

9*4 

1928 

12*2 

6*7 

1936 

14*6 

8*3 

1929 ' 

11*5 

7*2 

1937 

11*8 

7*7 


Represent on one graph the series for males and that for females. 
Smooth the senes for males by a live-yearly moving average, 

(L.L. Inter. Com.) 


Year 

Annual 

figures 

(males) 

5-yearly 

moving 

total. 

5-yearly 

moving 

average. 

Deviations of annual 
figures from M.A 
+ . 

1922 

16*1 

— 

— 

— 

— 

1923 

12*4 

— 

— 

— 

— 

1924 

10*8 

64*5 

12*90 

— 

2*1 

1925 

12*0 

59*3 

11*86 

0*1 

— 

1926 

13*2 

59*1 

11*82 

1*4 

— 

1927 

10*9 

59*8 

11*96 

— 

M 

1928 

12 2 

64*2 

12*84 

— 

0*6 

1929 

11*5 

73*4 

14*68 

— 

3*2 

1930 

16 4 

87*6 

17*52 

— 

1*1 

1931 

22*4 

98*4 

19*68 

2*7 

— 

1932 

25*1 

106*0 

21 20 

3*9 

— 

1933 

23*0 

107*4 

21*48 

1*5 

— 

1934 

19*1 

99*6 

19 92 

— 

0*8 

1935 

17*8 

86*3 

17*26 

0*5 

— 

1936 

14*6 

— 

— 

— 

-- 

1937 

11*8 

— 

— 

— 

— 


It is not necessary to include the moving total column; but with 
small figures, as in the above example, its introduction makes it easy to 
do all the necessary calculations mentally and therefore saves time. 

Let the annual figures be represented by x v x 2 , x 2t etc., the 
3-yearly moving totals by t v t 2f t z> etc., and the 3-yearly moving 
averages by a v a 2 , a z , etc. 



56 INTRODUCTION TO STATISTICAL CALCULATIONS 


Then the first moving total, centred at x 2 , is given by 

i x =r ( x 1 + X 2 + #3) 

and the second by t 2 = (x 2 + # 3 + x A ). 

,\ obviously = ^1 + (^4 ~ * 1 ) 

which gives the quick method for obtaining moving totals. 

If the figures are reasonably small a similar method may be used 
to obtain the moving averages directly. 

a i = i( x i + x 2 + * 3 ) 
a 2 = | (# 2 + #3 + 
a 2 - At l + i(% 4 - * a ) 

— a 2 + |(# 5 -- % 2 ), and so on. 

The equation y ~ ax b t where a and b are constants, represents 
a straight line. 

If a series of values of y is obtained by giving # successive values 
at equal intervals, and a moving average calculated, the points on 
the moving average are merely points on the original line. 

If, hpwever, a moving average be calculated for values of y, 
similarly obtained from the equation y -- a f bx (- cx 2 , which is 
the equation of a parabola, none of the points on the moving average 
line will lie on the original curve. 

If the coefficient of x 2 is positive, the function has a minimum 
value, and the parabola is consequently concave downwards. In 
this case the moving average line will lie wholly above the original 
curve. If, on the other hand, the coefficient of x 2 is negative, the 
function has a maximum value and the parabola is concave upwards; 
in this case the moving average line will lie wholly below the original 
curve. 

Example IX. Calculate the values of y for integral values of x, from 
^“Oto^ — lO from the equation y = \x 2 — 3# + 4. Calculate a 
3-point moving average of the values of y and plot the original values 
and the moving average on the same diagram. 

Note that the moving average values are in each case greater than 
the original values. 

Example X. Go through the same process when y — lOx — x 2 , 
taking integral values of from # = 0 to x = 10. 

* Note that in this case the moving average values are less than the 
corresponding original values by 

Example XI. The following figures represent a regular periodic series; 

*( = period): 1 2 3 4 5 6 7 8 9 10 11 12 13 

y: 20 25 30 31 30 25 20 15 10 9 10 15 20 

and so on. 



TIME SERIES 


57 


Calculate, from x = 1 to x ~ 21, (i) a 12-period moving average; 
(ii) a 5-period moving average; (lii) a 6-period moving average; (iv) a 
9-period moving average, and (v) a 15-period moving average. 

Represent the original figures and all the above moving average lines 
on one diagram and note: 

(a) the 12-period moving average is a straight line, with 
equation y = 20; 

(5) the 5-, 6-, and 9-period moving averages have the same hills 
and hollows as the original figures, but with progressively closer 
approach to the straight line trend, 

(c) the 15-penod moving average is flatter still, but it reverses t 
the hills and hollows of the original senes. 

Choice of Period for Moving Average 

Where annual figures, exhibiting cyclic fluctuation, show a 
regular, or fairly regular, periodicity of, say, n years—i.e., n years 
is the approximate time between successive “ crests ” or successive 
“troughs”—then an «-yearly moving average will naturally be 
chosen, or perhaps an integral multiple of n, if n be fairly short. If 
the period chosen is less than n years, the moving average line will 
be less regular than if n be chosen, while a period of 2n or 3 n years 
will give a smoother trend line. If the period chosen be greater 
than n but less than 2 n, and so on, then the crests and valleys of the 
original series will tend to be reversed in the trend line, and such a 
choice should accordingly not be made. 

Example XII. The following annual series has a 5-year period. 
Calculate the 5-yearly moving average and plot it with the original 
figures on the same graph. 

Then calculate a 7-yearly moving average for the series, and plot 
it on the diagram. The “ reversal " effect referred to above will then 
be clearly seen. 



Annual 


Annual 


Annual 

Year 

figure 

Year 

figure 

Year. 

figure. 

1 

no 

11 

130 

21 

146 

2 

104 

12 

127 

22 

142 

3 

98 

13 

122 

23 

138 

4 

105 

14 

118 

24 

135 

5 

109 

15 

130 

25 

145 

6 

120 

16 

140 

26 

155 

7 

115 

17 

135 

27 

150 

8 

no 

18 

130 

28 

148 

9 

114 

19 

127 

29 


10 

122 

20 

135 

30 

156 





31 

160 



58 INTRODUCTION TO STATISTICAL CALCULATIONS 


Most actual series, unfortunately, depart more or less widely from 
regularity, and only experience can lead to a wise choice; the 
period adopted will naturally be that which best smooths the 
series. A long series will probably necessitate the use of two or 
more distinct periods, as any regular pattern which may appear in 
the earlier part may have been destroyed or drastically changed by 
wars and their after-effects. 

Example VIII illustrates the calculation of the moving average 
when the arithmetic mean is employed, and the actual deviations 
of the annual figures from the trend line measured. A somewhat 
"longer, but on the whole more satisfactory procedure is to employ 
the geometric mean for determining the trend line and to calculate 
the “ ratios to trend ” of the annual figures. This method is illus¬ 
trated below with the data of Example VIII. 

Example XIII. Smooth the series for males, using a five-yearly 
moving average (geometrical mean), and obtain the ratios to trend. 


Year. 

Logs of 
annual 
figures. 


Log 

trend 

Log 

ratios 

Moving 

average 

(G.M.). 

Ratios to 
trend 
(per¬ 
centages). 

1922 

• 1*2068 


- 


1 

i 

-- 

1923 

I-0934 

— 

— 

- 

i 

— 

1924 

1*0334 

5 5334 

1*1067 

I 9267 

12*78 

84*5 

1925 

1*0792 

5 3640 

1*0728 

0*0064 

11*82 

101*5 

1926 

1*1206 

5*3570 

1*0714 

0*0492 

11*79 

112*0 

1927 

1*0374 

5*3843 

1*0769 

1-9005 

11*93 

91-3 

1928 

1*0864 

5*5199 

1*1040 

1-9824 

12*71 

96-0 

1929 

1*0607 

5*7495 

1*1499 

1-9108 

14*12 

81*4 

1930 

1*2148 

6*1118 

1*2224 

1-9924 

16*69 

98*3 

1931 

1*3502 

6*3871 

1*2774 

0-0728 

18*94 

118*2 

1932 

1*3997 

6*6074 

1*3215 

0-0782 

20*96 

119*8 

1933 

1*3617 

6*6430 

1*3286 

0-0331 

21*31 

107*9 

1934 

1*2810 

6*4572 

1*2914 

1-9896 

19*56 

97*6 

1935 

1*2504 

6*1294 

1*2259 

0-0245 

16*82 

105*8 

1936 

1*1644 

-- 

— 


—. 

— 

1937 

1*0719 

- 

— 


— 

— 


M- (2). (3). (4). (5). 


Col. (3). Totals of col, (2) for 5 years, centred at mid-year. 



TIME SERIES $9 

Col. (4). Figures in col. (3) divided by 5, giving the log G.M. of each 
5-year period. 

Col. (5). Col. (2)-col. (4), giving log ratios of annual figures to the 
trend. 


Seasonal Fluctuations 

Example XIV. Compute the average seasonal movement in the 
following series : 

Quarterly Production 
(’000 tons) 



W 

(n) 

(in) 

(iv). 

1924 

3*5 

3*9 

3*4 

3*6 

1925 

3*5 

4*1 

3*7 

4*0 

1925 

3*5 

3*9 

3-7 

4-2 

1927 

4 0 

4*6 

3*8 

4*5 

1928 

4*1 

4*4 

4-2 

4*5 


Explanation of the Calculations Shown Below 


Col. (lii). 


Col. (iv). 


Col. (v). 


Col. (vi). 


Col. (vii), 


This is a four-quarterly moving total, the successive items 
being centred between the successive quarterly items of 
col. (ii), starting at the middle of 1924. 

The first two items of col. (in) are added and their su 
placed between them —i e., opposite the third quart©! 
1924; the sum of the second and third items is placed 
opposite the fourth quarter of 1924, and so on. 4 



The figures here are obtained by dividing the items of 
col. (iv) by 8. 

The column shows the “trend” of production,—i.e., a 
four-quarterly moving average of the production series. 


Subtract the trend figures from the original quarterly 
figures—col, (ii)—and place the results under the 
appropriate sign. 

This column gives the “ deviations from the trend 

This gives the production figures " corrected for seasonal 
variation ”, 


Col. (viii). The figures here are the differences between the corrected 
production figures and the trend—col, (v)—that is, they 
are “ residual ” divergencies from the trend after the 
seasonal corrections have been made, and may be 
supposed to be the result of random causes outside the 
general conditions which produce the seasonal effect. 



60 INTRODUCTION TO STATISTICAL CALCULATIONS 



Quarterly 




Devia¬ 

Quarterly 

Residuals 

Year 

produc- 




tions 

produc¬ 

col (vii)- 

and 

tion 



Trend 

(TOO). 

tion ('000) 

col (v) 

quarter. 

(’000). 



(TOO) 

4 - 1 

‘corrected/ 1 

1 (TOO) 

to. 

(n). 

(iu) 

(i\) 

(v) 

(Vl). 

(vii) 

(vm) 

1924 I 

3-5 

— 

— 

— 

— 

— 

— 

II 

3-9 


— 

__ 

— 

— 

— 



14*4 






III 

3-4 

14*4 

28*8 

3 60 

0 20 

— 

— 

IV 

3-6 

14*6 

29*0 

3*62 

0 02 

— 

— 

1925 I 

3*5 

14 9 

29 5 

3*69 

0 19 

3 68 

- 0*01 

11 

4*1 

15 3 

30*2 

3 78 

0 32 

3*86 

4 0*08 

III 

3*7 

15*3 

30*6 

3*82 

0 12 

3 93 

i Oil 

IV 

4*0 

15*1 

30 4 

3*80 

0 20 

3 83 

| 0 03 

1926 I 

3*5 

151 

30 2 

3*78 

0*28 

3*68 

- 0*10 

II 

3*9 

15*3 

30 4 

3*80 

0 10 

3 66 

- 0 14 

. Ill 

3*7 

15 8 

31 1 

3 89 

0*19 

3 93 

I 0 04 

IV 

4*2 

16 5 

32*3 

4 04 

0 16 

4*03 

- 0 01 

I" 1 

4 0 

16*6 

33-1 

4 14 

0 14 

4*18 

-| 0 04 

^ 11 

4 6 


33 5 

4 19 

0 41 

4*36 

1- 0*17 


3*8 

16 9 

33*9 

4 24 

0 44 

4 03 

- 0 21 

‘i# ‘ 


17 0 






* IV 

4*5 

16-8 

33 8 

4 22 

0 28 

4 33 

4 0 11 

1928 I 

41 

17*2 

34 0 

4 25 

0 15 

— 


II 

4*4 

17*2 

34*4 

4 30 

0 10 

— 


III 

4*2 


-- 

-- 

- 

- 

— 

IV 

4*5 


_ 

__ 


_ 



To obtain the seasonal corrections proceed as follows: 

Deviations from Trend 


Year. 1 

Quarter . 

I. 

II. 

III. 

IV. 

1924 


— 


- 0*20 

- 0-02 

1925 


-- 0*19 

4 0*32 

- 0*12 

4 0*20 

1926 


- 0*28 

+ 0*10 

- 0*19 

4 0*16 

1927 


0*14 

+ 0-41 

- 0*44 

4 0*28 

1928 


- 0*15 

+ 0-10 

— 

— 


Sum : 

- 0*76 

4- 0*93 

- 0*95 

4 0*62 


Average: 

- 0*190 

4 0*232 

- 0*237 

4 0*155 



TIME SERIES 


01 


The last figures above show the average deviation from the trend of 
each quarter. Note that there are only four complete years, so that 

Sum 

—- - — average. 


Before these averages are used as estimates of the seasonal variation 
they should be adjusted so that their sum is zero. 


The sum in the above case is — 0-040; 


, 0-040 

therefore —-—. 


i.e., 0-0+ 


must be added to each quarter, giving —0-180, +0-242, —0-227, + 0*165. 

Hounding these numbers off to two places of decimals we obtain 
— 0T8, + 0-24, — 0-23, + 0T7 as the average quarterly deviations— i.e., 
the first quarter is, on the average, 0-18 below the trend, the second 
quarter 0-24 above the trend, and so on. 

The corrections therefore to be applied to the original production 
figures, expressed in thousands, to eliminate the seasonal effect are: 

+ 0-18, - 0-24, + 0-23, - 0-17. 


Col, (vii) shows the production figures, after this correction has been 
effected, for the years 1925, 1926 and 1927. 

The student must take care not to state the “ quarterly corrections 
when the “ average quarterly variations ” are required, or vice versa. 

When monthly figures of production, sales or the like are given, 
the calculation of corrections to be applied to the monthly figure: 
to eliminate seasonal effects proceeds on lines similar to the fore¬ 
going, a twelve-monthly moving average being, of course, employed 
as the trend line. 


Use of the Geometric Mean 

In the foregoing example the arithmetic mean was used. There 
is much to be said, however, in favour of using the geometric mean, 
and, instead of measuring deviations from the trend line, calculating 
the ratios of the original quarterly, or other periodic, figures to the 
corresponding trend values. If logarithms are used throughout, 
the calculations are not particularly laborious, as will be seen from „ 
the following example, which illustrates both methods. 

Example XV. 


Personal Expenditure on Consumers* Goods and Services at 
Current Market Prices 
(£ million) 


and light. 

1st quarter. 

2nd quarter. 

3rd quarter. 

4th quarter. 

1945 

78 

62 

56 

71 

1946 

84 

64 

61 

82 

1947 

92 

70 

63 

85 

1948 

100 

81 

72 

96 


[Monthly Digest of Statistics . Source : Central Statistical Office] ) 



62 INTRODUCTION 

TO STATISTICAL 

CALCULATIONS 

(a) Moving Average based on the Arithmetic Mean: Deviations from Trend 


Quarterly 






expenditure. 



Trend. Deviations. 

1945 1 

78 



— 

— 

11 

62 

267 


— 

- 

III 

56 

273 

540 

67*5 

11 5 

IV 

71 

275 

548 

68 5 + 

2*5 

1946 1 

84 

280 

555 

69*4 

14-6 

II 

64 

291 

571 

71*4 

7*4 

III 

61 

299 

590 

73*7 

12*7 

IV 

82 

305 

604 

75*5 4 

6*5 

1947 I 

92 

307 

612 

76 5 4 - 

15*5 

IT 

70 

310 

617 

77*1 

7 t 

III 

63 

318 

628 

78*5 - 

15*5 

IV 

85 

329 

647 

80 9 + 

4*1 

1948 1 

100 

338 

667 

83*4 4 - 

16*6 

II 

81 

349 

687 

85*9 - 

4*9 

III 

72 



— 

— 

IV 

96 

I. 

IT. 

III. 

IV. 



— 

— 

- 11*5 

4* 2-5 



+ 14-6 

- 7- 

4 - 12-7 

4* 6*5 



+ 15*5 

- 7* 

1 - 15*5 

4* 4*1 



+ 16*6 

- 4*9 

— 



4 46*7 

- 19*4 - 39*7 

4* 13-1 

/Average quarterly deviations : -f 15*57 — 6*47 — 13*23 

4- 4*37 

The sum of the average quarterly deviations is -f 0*24 ; 

0*06 must 

* therefore be 

subtracted 

from each 

deviation, giving the 

seasonal 

.corrections as 

- 15-5, + 6-5, + 13-3, 

- 4*3* 





(£ million) 

Quarterly 



Quarterly 



figures, cor¬ 



expendi¬ 

Seasonal 

rected for 



ture. 

Trend correction. 

seasonality. Residuals. 

1946 I 

84 

69*4 

- 15*5 

68*5 

- 0*9 

II 

64 

71*4 

-f 6*5 

70*5 

- 0*9 

III 

61 

73*7 

•f 13*3 

74*3 

+ 0*6 

IV 

82 

75*5 

- 4*3 

77*7 

4* 2*2 



TIME SERIES 


The original series lias thus been analysed into parts due to th^ 
influence of trend, seasonal and random variations respectively—i.e. : 

Original figure = Trend + Season -f Residual 
e.g., 84 = 69-4 4* 15-5 - 0-9 

(b) Moving Average Based on the Geometric Mean : Ratios 
to Trend 

The calculation proceeds as in (a), except that logarithms of numbers 


replace the numbers themselves. 









Ratios of 
quarterly 
expendi¬ 



Quar¬ 






ture to 



terly 






trend. 



expendi¬ 




Log 

Trend, 

expressed 



ture 




ratio to 

G.M. 

as per¬ 



logs. 



GM. 

trend 

(£ mn ). 

centages. 

1945 

I 

1-8921 



— 

— 

— 

— 


II 

1-7924 

7-2840 


— 

1-9232 

— 

— 


III 

1-7482 

7 3162 

14-6002 

1-8250 

66-83 

83-79 



IV 

1-8513 

7-3300 

14-6462 

1-8308 

0-0205 

67-74 

104-8 

1946 

I 

1-9243 

7-3671 

14-6971 

1-8371 

0-0872 

1-9566 

68-73 

122-3 


II 

1-8062 

7 4296 

14-7907 

1-8496 

70-73 

90-48 

1-9230 


III 

1-7853 

7-4691 

14-8987 

1-8623 

72-83 

83-75 



IV 

1-9138 

7-5080 

14-9771 

1-8721 

0-0417 

74-49 

110-1 

1947 

I 

1-9638 

7-5220 

15-0300 

1-8787 

0-0851 

1-9627 

75-63 

121-6 


II 

1-8451 

7-5376 

15-0596 

1-8824 

76-28 

91-77 

1-9104 


III 

1-7993 

7-5738 

15-1114 

1-8889 

77-43 

81-36 



IV 

1-9294 

7-6372 

15-2110 

1-9014 

0-0280 

79-69 

106-7 

1948 

I 

2-0000 

7-6952 

15-3324 

1-9165 

0-0835 

1-9781 

82-50 

121-2 


II 

1-9085 

7-7481 

15-4433 

1-9304 

85-15 

95-08 



III 

1-8573 




— 

— 

— 


IV 

1-9823 



— 

— 

— 

_ 





64 INTRODUCTION TO STATISTICAL CALCULATIONS 


Log Ratios to Trend 



I. 

1J 

III. 

IV. 

1945 

— 

— 

I 9232 

0-0205 

1946 

0-0872 

1-9066 

1-9230 

0*0417 

1947 

0-0851 

I 9627 

I 9104 

0-0280 

1948 

0-0835 

1-9781 

— 

— 


0*2558 

I 8974 

1-7566 

0-0902 

Average . 

0-0853 

I 9658 

1-9189 

0-0301 


The product of the average quarterly ratios to trend should be 1, i.e., 
the log of the product should be 0-0. In this case the sum of the logs 
is 0 - 0001 , so that no adjustment is necessary. If the sum of the logs 
differs appreciably from 0, the adjustment consists of adding, or sub¬ 
tracting, | of the difference to, or from, the log of each quarterly 
average. 

Deloganzing the above we have * 

Quarter Ratio to trend Quarter Ratio to trend 

I 1217 ill 0 8206 

II 0 9243 IV 1-072 


Each quarterly figure must therefore be divided by the appropriate 
ratio to remove the seasonal effect. 

The original series has been analysed into parts—trend, seasonal 
and random variations—as m (a). The results of the analysis may be 
exhibited as below: 


(1) In Logarithms 



Original figures — 

Trend -}■ 

Seasonal 

-1- Residuals 

I 

1-9243 

1-8371 + 

0-0853 

4 

0-0019 

II 

I 8062 

1-8496 -f 

I 9658 

+ 

1-9908 

III 

1-7853 - 

1-8623 + 

1-9189 

+ 

0-0041 

IV 

1-9138 

1-8721 + 

0-0301 

-h 

0-0116 


(2) In Actual Values 



Original figures 

~~ Trend 

X 

Seasonal 

X Residuals 

I 

84 

- 68-73 

X 

1-217 

X 

1-004 

II 

64 

- 70 73 

X 

0-9243 

X 

0-979 

III 

61 

= 72 83 

X 

0-8296 

X 

1-01 

IV 

82 

- 74-49 

X 

1-072 

X 

1-027 




TIME SERIES 


65 


Quarterly Expenditure Corrected for Seasonal Variation 



Quarterly 

expenditure. 

Average 
ratio to 
trend. 

Corrected 

quarterly 

expenditure. 

Trend 

I 

84 

1*217 

69*02 

68*73 

II 

64 

0*9243 

69*23 

70*73 

III 

61 

0*8296 

73*53 

72*83 

IV 

82 

1*072 

76*50 

74*49 


Seasonal Variation—the Method of Monthly (or Quarterly) Totals 
or Averages 

For the sake of brevity, a quarterly series is used to illustrate this 
method. The procedure in the case of a monthly series is essentially 
the same. 

Example XVI. 

Quarterly Production 
('000 tons) 

% of each 
quarterly 
total to 
average quar- 


Quarter 

1924. 

1925 

1926 

1927. 

1928. 

1929 

Total 

terly total 

I 

3*5 

3*5 

3*5 

4*0 

4*1 

4*2 

22-8 

94*12 

If 

3*9 

4*1 

3*9 

4*6 

4*4 

4*6 

25*5 

105*26 

111 

3*4 

3*7 

3*7 

3*8 

4*2 

4*3 

23*1 

95*36 

IV 

3*6 

4*0 

4*2 

4*5 

4*5 

4 7 

25*5 

96*9 

105-26 

400*00 


Average quarterly total = 24*225. 


The last column of the table—whose sum, of course, should be 400— 
gives the measure of seasonal variation. To eliminate the seasonal 
effect, divide the production of a first quarter by 0 94, of a second 
quarter by 1-05, and so on. One defect of this method is that it gives 
undue weight to the data of years m which the greatest amount of 
activity occurs. Another is that the method does not allow' for the 
trend. Consequently where a trend—either rising or falling—exists 
there is an " annual trend increment "—or decrement—which causes 
the earlier months of the year to be less than they would otherwise have 
been, and the later months to be larger. 

Seasonal Variation—The Method of Link Relatives 

Link Relatives: Divide each quarterly figure by that of the 
preceding quarter, and express the result as a percentage. These 
percentages are known as link relatives, 
c 



66 INTRODUCTION TO STATISTICAL CALCULATIONS 

Example XVII. The data of Example XVI will be used to illustrate 
this method. 

Link Relatives 



I. 

II. 

III. 

IV. 

I. 

1924 

— 

111-43 

87-18 

305-88 

— 

1925 

97-22 

117-14 

90-24 

108-11 

— 

1926 

87-50 

Ill 43 

94-87 

113-51 

— 

1927 

95-24 

115-00 

82-61 

118-42 

— 

1928 

91*11 

107-32 

95-45 

107-14 


1929 

93-33 

109-52 

93-48 

109-30 J 

— 

A.M. : 

92-88 

111-97 

90-64 

110-39 

— 

Chain relatives : 

100 

111-97 

101-49 

112-03 1 

104-05 

Adjusted C.R. : 

100 

110-96 

99-47 

10900 

100 

Seasonal index : 

95-4 

105-8 

94-9 

103-9 

__ 


Each arithmetic mean link relative shows the average relation of 
each quarter to the preceding quarter. 

The next step is to relate the A.M. link relatives to a common base— 
e.g., the first quarter—which then becomes the starting point, or 100%. 

The 2nd quarter Chain Relative = the 2nd quarter link relative 
which was calculated to represent the 2nd quarter when the 1st 
quarter = 100. 

The 3rd quarter A.M. link relative represents the 3rd quarter when 
the 2nd quarter = 100. 

The 3rd quarter C.R. - 3rd quarter L.R. x 2nd quarter C.R. -- 100. 

- 90-64 x 1 1197 

The 4th quarter C.R. ■= 4tli quarter L.R. X 3rd quarter C.R. - 100. 

= 110-39 X 1-0149. 

Similarly the 1st quarter A.M. link relative represents the 1st quarter 
when the 4th quarter = 100, which gives 

1st quarter C.R. = 92-88 x 1*1203 = 104-05. 

This new 1st quarter chain relative, which will be seen in the sixth 
column of the table, does not usually equal the original 1st quarter C.R. 
—i.e., 100—owing to long-term trend and other influences. 

The difference divided by 4—i.e., 4 quarters—is + T0L The chain 
relatives are accordingly adjusted by subtracting 1-01 from the C.R. 
for the 2nd quarter, 2-02 from the C.R. for the 3rd quarter and 3*03 
from the C.R. for the 4th quarter. 

The series of adjusted chain relatives gives each quarter as a per¬ 
centage of the 1st quarter. 

The average of the " adjusted ” C.R. row 
__ 419-43 
4 

= 104-86. 




WEIGHTED AVERAGES. RATES. TIME SERIES 07 


Seasonal Indices 


1st quarter = 

1001 

f 100 \ 

a 04-86/ 

= 95-4 

2nd ,, = 

100 ( 

rll0*96\ 

- 105-8. 

a04-86/ 


The elimination of the effects of seasonal variation is achieved by 
dividing each quarterly figure by the appropriate seasonal index. 

This index, which is widely used in the U.S.A., gives reasonably 
satisfactory results. The chain multiplication, however, makes 
cumulative errors possible to an extent which cannot be estimated. 

There are many other methods of determining trends, ranging 
from the use of a piece of black thread, or transparent ruler, to fix 
a linear trend by eye, to most laborious and difficult calculations. 
It may be said that there is considerable difference of opinion as 
to whether the results achieved by the latter are at all commensurate 
with the labour involved. 

In Chapter VI a brief account is given of the “ least squares ” 
method of determining straight or curved “ lines of best fit ” to 
observed data. 


Exercise 2 


1 Compute the weighted average weekly earnings of the following groups 
of wage-earners taken together. Use two or more sets of approximate weights 

and compare the results 

s. d. 

i. d. 

(a) 18,229 

63 8 

(J) 22,923 75 5 

(b) 35,271 

64 6 

(g) 21,891 60 1 

(c) 36,304 

55 9 

(h) 22,606 55 4 

(d) 38,190 

78 3 

(*) 17,274 74 3 

(e) 30,767 

105 7 

(}) 14,895 77 10 

2 Find the average receipts per car-mile for the whole system for 1934 
and 1935 from the following data, using several sets of approximate weights 

for comparison 


Receipts per car- 

Route. 

Car-miles. 

mile (pence). 

1934 I 

303,425 

16-25 

11 

377,511 

16-37 

III 

336,432 

17-12 

IV 

410,200 

15-84 

1935 I 

295,623 

17-32 

II 

380,410 

17-51 

III 

315,226 

18-26 

IV 

218,010 

15-75 


1935 



08 INTRODUCTION TO STATISTICAL CALCULATIONS 


3. The average price of wheat and the quantities sold at four markets are 


given as follows: 

Average price 

Quantity sold 

Market. 

per quarter, 
s. d. 

(quarters). 

A 

27 3 

36,000 

B 

28 8 

1,000 

C 

29 1 

16,000 

I) 

27 2 

12,000 


Find the mean price per quarter for the four markets, weighting each local 
average with the quantity sold 

Would it be possible for the average price at each of the four markets to 
rise and yet for the weighted mean price to fall ? If so, under what conditions ? 

(L U. Inter Com ) 

4. The weighted arithmetic mean of 125, 229 and 275 is 195. If the weights 
of 125 and 229 are 7 and 5, respectively, find the weight of 275 

5. The weighted geometric mean of 125, 229 and 275 is 203. If the weights 
of 125 and 275 are 3 and 4, respectively, find the weight ol 229 

6. Find the simple average of Col 2 and the weighted average, using the 
quantities in Col 3 as weights, and explain the difference between the two results. 


1 . 

2. 


3. 


Price per yard 

Quantity (10 

Piece goods 

(pence) 


million yards). 

Unbleached 

2 79 


236 

Bleached 

317 


205 

Printed flags 

3-09 


5 

Other sorts 

3-27 


118 

Dyed in piece 

4-54 


115 

Of dyed yarn 

7. Commodity 

3-98 

Weight 


29 

(L U. Inter. Com ) 

group. 

of group 

Index 


(a) 

4 

113 


(b) 

5 

140 


(c) 

3 

120 


(d) 

— 

130 


(«) 

— 

108 


(/) 

Zw =r 25 

105 


The weighted arithmetic mean of the indices 

of the above six groups of 


commodities is 117-68, and the sum of the weights of the six groups is 25. 


Calculate— 

(1) the weighted A M. of the groups a, b, c, taken together; 

(2) the weighted A.M of the groups d, e, /, taken together. 

8. In the foregoing table the weighted geometric mean of all the groups is 

116-9. 

Calculate— 

(1) the weighted G M. of the groups a, b, c, taken together, 

(2) the weighted G.M. of the groups d, e, f, taken together. 



WEIGHTED AVERAGES. RATES* TIME SERIES 69 


Average annual 
earnings (rupees). 

263*5 
457*2 
244-8 
332*7 
194*2 
285*8 
361*9 

800 

Find the weighted annual earnings for the whole group. 

10 The populations of two towns are given as 876,942 and 690,272, 
respectively, and their respective death rates as 14*3 and 18*2 per 1000. Find, 
to the nearest whole number, the death rate for the two towns taken together. 



Marks of Two Scholarship Candidates 




Marks of 


Subject 

Weight. 

1st candidate. 2nd candidate. 

A 

1 

70 

80 

B 

3 

65 

64 

C 

2 

58 

56 

D 

4 

63 

60 


Find the weighted average mark of each candidate By what figure would 
the 2nd candidate have had to increase his marks m subject B, all other marks 
remaining the same, in order to have tied for 1st place ? 

12 Define a " weighted average ”. 

Total wages 

Operatives Number. in a week. 

Skilled .... 100 ^354 

Semi-skilled . 120 £354 

Unskilled . . . 150 £354 

A flat increase of 4s per week is now given Find the average percentage 
increase per head per week for each class of operative and for the total 

(L.U. Inter. Com.) 

13. Weights, proportional 


Market. 

Price per doz. 

to amounts sold, 

A 

4s 

6 

B 

3s. 

4 

C 

2s. 6 d. 

3 

D 

2s. 

2 


Calculate— 

(i) the unweighted average price per doz. for the 4 markets; 
v (h) the weighted average price per doz. for the 4 markets, 
i (iii) the weighted average price per doz. for the 4 markets, using 
I weights which are the reciprocals of those given above, 
f (iv) the weighted average price per doz. for the 4 markets using the 
jt weights given in the table increased in each case by 50—i.e., 56, 54, 53, 52* 


9. Industrial group 

and average daily 
no. employed (’000). 

(a) 265 

(b) 95 

(c) 86 

(d) 63 

{e) 88 

(/) 34 

(g) 169 



70 INTRODUCTION TO STATISTICAL CALCULATIONS 


Explain the difference m the four averages 


14. Compute the simpl 
below. 

e average and the 

weighted average of the items 

Item. 

Weight Item. Weight. 

124 

9 

172 2 

112 

23 

102 1 

113 

16 

85 46 

128 

14 

143 2 

146 

4 

113 1 

151 

6 

153 5 

110 

7 

108 11 

68 

1 

101 31 

Explain the reason for the difference between the two averages. 

(L U. B.Sc. Econ , 1938) 

15. Compute (1) the 

unweighted average—each district given equal 

importance—and (2) the weighted average— 
tance—of the expenditure on rent. 

-each family given equal impor- 

District. 

Rent (shillings). 

No. of families. 

A 

150 

1 

B 

14-3 

11 

C 

13-9 

26 

D 

13 3 

7 

E 

13-2 

2 

F 

130 

4 

G 

12 0 

14 

H 

11 5 

9 

I 

113 

22 

96 

(L U Inter Com ) 

16 Table showing the number of women in age groups and the number of 
female children born in one year. 

No. of women No of female 

Age group. 

(’000). 

children born ('000). 

15-19 

363 

3-95 

20-24 

367 

14-34 

25-29 

354 

25-16 

30-34 

324 

21-52 

35-39 

288 

13-76 

40-44 

253 

5-32 

45-49 

225 

0-46 


Compute for each age group the number of female children born per annum 
per 1000 women—i.c., the fertility rate 

' Assuming that the fertility rate is the same at each year of age in each 
group and is unchanged over a generation, and assuming no deaths, compute 
the average number of female children born to a woman during the reproductive 
period (ages 15-50). 

[Thus in the ages 15-20 the average number is 5 x 1*95 4* 363 -] t 

(L.U, B.Sc. Econ , 1938) 



WEIGHTED AVERAGES. RATES. TIME SERIES 71 
17. You are given the following statistics of population and unemployment 

(а) Your country as a whole for a standardized age distribution; and 

(б) The local administrative area in which you live. 

Calculate— 

(i) the standardized unemployment rate m the country as a whole; 

(n) the standardized rate of unemployment m the local area; and 
(in) the crude rate of unemployment in the local area. 

Age (years). 



16-30. 

30-45. 

45-60. 

60- 

Total. 

Standard population : 

Age constitution . 

250 

350 

300 

100 

1000 

Unemployment rate, % . 

5 

8 

12 

15 

— 

Local population : 

Age constitution . 

300 

300 

350 

50 

1000 

Unemployment rate, % 

4 

9 

12 

20 

— 


(Soc. I.A.A.) 


18 Calculate the crude and standard death-rates from the following data : 


Age 

group. 

Popula¬ 

tion 

('000). 

No. of 
deaths 

Specific 
death-rate 
per 1000 

Standard 
age distri¬ 
bution 
per 1000. 

No. of deaths : 
specific rates 
applied to 
standard 1000. 

0- 9 

I 21 

! 350 


221 


10-24 

30 

| 102 


298 


25-44 

37 

229 


285 


45-64 

17 1 

354 


149 


65 and over 

n 1 

1 

415 


47 

j 



19. Complete the following table and state (a) the crude and (b) the 
standardized death-rates ' 


Males ; 


Females : 



Popula¬ 
tion in 

Num¬ 

Specific 

rate 

Age distri¬ 
bution of 


age- 

ber of 

per 

a standard 

Age. 

group 

deaths 

1000. 

1000. 

0- 4 

2,110 

30 


59 

5-14 

3,340 

6 


109 

15-34 

7,320 

16 


177 

35-59 

7,960 

70 


121 

60 and over 

3,240 

196 


34 

0- 4 

2,010 

27 


55 

5-14 

3,230 

8 


102 

15-34 

7,310 

20 


180 

35-59 

8,750 

57 


122 

60 and over 

4,280 

230 


41 

Total: 

49,550 

660 


1000 


No of 
deaths per 
standard 
1000 age- 
group. 




72 


INTRODUCTION TO STATISTICAL CALCULATIONS 


20. Plot the following figures relating to U.K. Revenue (£ million). 



Jan-Mar. 

Apr.-June. 

July-Sept. 

Oct.-Dec. 

1940 

38*8 

145 

20* 1 

22-8 

1941 

51-0 

24-5 

32-2 

37-2 

1942 

66-3 

42*6 

46-3 

45-9 

1943 

81-9 

41-3 

51-9 

470 

1944 

92-4 

46-2 

55*9 

49-9 


Compute the average seasonal movements of the figures in the above table. 

(Inst. T., 1945) 

21. The following figures represent monthly average production of pig iron 
in the United Kingdom, 1924-38, in units of 10,000 tons. 


1924. 

1925. 

1926. 

1927 

1928 . 

1929 

1930. 

1931 

609 

522 

205 

608 

551 

632 

516 

314 

1932. 

1933. 

1934. 

1935. 

1936 

1937 

1938. 


298 

345 

497 

535 

640 

708 

564 



Plot the figures on squared paper and then find and plot their moving 
average Give reasons for your choice of period. 


(Soc. I.A.A.) 


22. Cost of Living Index Numbers (July 1914 ™ 100) 

Yearly Averages, 1922-45 


1922 

183 

1930 

158 

1938 

156 

1923 

174 

1931 

1474 

1939 

158 

1924 

175 

1932 

144 

1940 

184 

1925 

176 

1933 

140 

1941 

199 

1926 

172 

1934 

141 

1942 

200 

1927 

1671 

1935 

143 

1943 

199 

1928 

166 

1936 

147 

1944 

201 

1929 

164 

1937 

154 

1945 

203 

Estimate the trend m this 

series 

with the aid 

of a seven-year moving 

average. Plot the original series and the trend on 

a graph 






(L U Inter. Com ) 

23. 

Quarterly Index Numbers of Production 



Coal 1924 ----- 100 



Year. Quarters • 

I 

II. 

III 

IV 

1936 

93-3 

81-7 

81-5 

89* 1 

1937 


93-8 

92*3 

86-5 

93-7 

1938 

97-6 

82-2 

79-0 

89-3 

Obtain the average seasonal variation. Apply seasonal corrections to the 

1937 indices. 




(L.U. Inter. Com.) 

24. 

Quarterly Output 

in Thousand Units 


Quarter. 

1930. 

1931. 

1932. 

1933. 

1934. 

I 

31 

42 

49 

47 

51 

11 

39 

44 

53 

51 

54 

III 

45 

57 

65 

62 

66 

IV 

36 

45 

55 

50 

58 



WEIGHTED AVERAGES. RATES. TIME SERIES 


73 


Compute and apply the corrections for seasonal movement, 

{a} Using the A M and deviations from trend 
(b) Using the G M and ratios to trend 
23 Tonnage of Shipping Entered with Cargoes at U K Ports 

('00,000 tons) 



1st quarter 

2nd quarter 

3rd quarter 

4th quarter 

1927 

134 

153 

163 

151 

1928 

135 

154 

159 

155 

1929 

132 

159 

177 

159 

1930 

139 

166 

176 

156 

1931 

133 

153 

167 

150 


| Represent the above series graphically and compute the seasonal movement, 
! and eliminate it m 1931 

\ (L U B Sc Econ , 1932) 

I 26 Imports into the United Kingdom 

\ (Unit £1,000,000) 


Years 


Quarters 


Total 

Relative 


First 

Second 

Third 

Fourth 

prices. 

1930 

283 

258 

244 

260 

241 

1045 

100 

1931 

210 

208 

204 

863 

81 

1932 

194 

168 

159 

183 

704 

76 

1933 

159 

162 

168 

189 

678 

72 

1934 

184 

179 

176 

197 

736 

74 

1935 

179 

182 

182 

219 

762 

76 

1936 

! 200 

204 

207 

243 1 

1 

854 

80 


Compute the seasonal movement of imports Allowing for normal seasonal 
change which quarter do you consider showed the smallest trade m 1932-33 ? 

(L U Inter Com ) 

27 In the last column of the above table the relative prices (or average 
values) of imports are shown as a percentage of the level m 1930 Apply 
these figures to the total value of imports m each year so as to estimate the 
change in quantity of imports over the period. 

, (L U Inter Com ) 

I 28 Estimate the seasonal variation in the Series of Question 24 by the 
I method of quarterly totals 

| 29 Apply the link relative method of obtaining a measure of seasonal 

| variation to the date of Question 26. 

t 

I 

i 




CHAPTER III 


INDEX NUMBERS 

* 

I. Price Index Numbers 

For purposes of illustration various types of index number will be 


calculated from the following simple data. 

Table A 

Pru es 

Commodity. 

Unit. 

Qo 

Po 

Pi 


P, 




s. d. 

? d. 

s d 

6 d 

A 

lb 

3 

2 0 

2 6 

2 3 

2 6 

B 

1 

4 

2 6 

3 0 

2 9 

3 0 

C 

doz. 

2 

3 0 

2 6 

2 9 

3 3 

D 

lb 

1 

1 0 

9 

1 3 

1 6 




8 6 

8 9 

9 0 

10 3 


The following symbolism will be used: 

Q 0 = quantity in base year, or period—number of units. 

Qo'> Qo'" represent number of units of commodities 
1, 2, 3 . . . respectively in the base period. 

P 0 = price in the base year. 

P 1( P 2 , P 3 = prices in subsequent years. 

P 0 ', P 0 ", P 0 "' — prices of commodities 1, 2, 3 in base year. 

P/, PP/" = „ >, 1,2, 3 in the rth year. 

Y 0 = base year. Y v Y 2 , Y 3 = subsequent years. 


Types of Index Number 
Simple Aggregate of Prices 


Example I. £P 0 - 8/6, SPj - 8/9, SP 2 = 9/-, SP 3 - 10/3. 

Calling EP 0 100, i.e., making Y 0 the base year, gives us indices for 
years Y v Y z , Y 3 : 


w i<» |p-; - 

(2) loog- 

yp 

(3) 100^ = 


looi 8 ^ 


100 


100 


8/6 

8/6 

(io/ 3 ) 

8/6 


- 102-9 
=- 105-9 
« 120-6. 


74 



INDEX NUMBERS 


76 


Although this index number is apparently " unweighted ", it should 
be noted that the various items are m fact weighted according to the 
unit for which the price is quoted, and that alterations in the units are, 
in effect, alterations in the relative importance of items. 

Effect of Changing the Unit for which the Price is Quoted 

Example II. 

Data of Table A with Alterations m Units 


Commodity 

Unit 

p„- 

Pi 

p 2 

Pa 



5 d 

s d 

t d 

s d 

A 

o z 

H 

a 


n 

B 

1 

2 6 

3 0 

2 9 

3 0 

C 

each 

3 


2 { 

3} 

D 

lb 

1 0 

9 

1 3 

1 6 

Aggregate of prices 

3 10J 

4 U 

4 4* 

4 111 

Index numbers 

100 

106 2 

112 8 

127-2 


x 100 


} Compare these results with those obtained above. 

‘ Weighted Aggregate of Prices 

On Y 0 as base (— 100). 

Index for Y - f W + iYWM *“'Qo" 1 • -_•! x 100 
index tor Y, - ^ vg# , P# "Q o " + P # '"Q 0 ”' . . .] X 1UU 

- ioo^iQo. 

SP 0 Q 0 

The quantities m the base year a r e used as weights and the weights 
are applied to the prices direct. 

Example III. From the data of Table A. 


SPjQo = 7-5 + 12 + 5 + 0-76 = 25-25 
SP 2 Q 0 = 6-75 + 11 + 5-5 + 1-25 = 24-5 
EP 3 Q 0 = 7-5 + 12 + 6-5 + 1-5 = 27-5 


S PiQp ^ 25-26 = 
SP 0 Qo' 23 

= 1-196. 


= 1-065 



76 INTRODUCTION TO STATISTICAL CALCULATIONS 


The indices accordingly are: 

Y 0 : 100 
Y x : 109*8 
Y 2 : 106-5 
Y 3 : 119-6. 

What these indices compare is the relative cost in different years of 
purchasing the base-year quantities of the various commodities, as the 
current year prices. 

Simple (“ Unweighted ”) Arithmetic Mean of Price Relatives 

A " price relative ” is the ratio between the price of a commodity 
in any year and its price in the base year, the price in the base year 
being represented by 100: 

p ' p " 

i.e., 100 |~ 7 , 100pV/ are price relatives, 

bo bp 

The index for base year, Y 0 , = 100. 

The index for Y x — -(lOO™ + lOO^ 1 ,, ( . . where n is the 

n ' b o r o / 

number of commodities 

-l^/P/, P," L. EC 

■ n \P 0 ' + P 0 " _r • ' + P 0 " 

_ 100 P, 

“ » ~P„‘ 

The index for Y, --- and so on< 

n P 0 

Example'IV. From the data of Table A. 


Price Relatives 


Commodity. 

Y 0 . 

Y, 

y 2 

Y 3 . 

A 

100 

125 

112-5 

125 

B 

100 

120 

110-0 

120 

C 

100 

831 

91f 

108} 

D 

100 

75 

125 

150 

Totals . 

400 

403} 

4391 

5031 

Average —IN 

100 

100-8 

109-8 

125-8 


If the above table be recalculated using Y 3 as base, the following 
results are obtained: 


Year. 

Index Number. 

Y. 

80-58 

Y, 

81-73 

Y a 

87-40 

Y, 

100 



INDEX NUMBERS 


77 


It should be observed that the index numbers calculated on Y 0 as 
base are not in the same proportion as those obtained by using Y a as 
base: 

i.e., 100 : 100-8 . 109-8 : 125-8 ^ 80-58 : 81-73 : 87-40 : 100 

i.e., 1 1*008 : 1*098 : 1*258 1 * 1*014 : 1-084 : 1-241 

This is what is meant by saying that this form of index does not 
satisfy the " time reversal test If the index number of 1930 on 
1924 as base is 150, it is natural to suppose that the index number of 
1924 on 1930 as base will be 66With the form of index number 
under discussion this is not so. 

The cause of this is that although the simple arithmetic mean of price 
relatives is called an unweighted average, each item is actually weighted 

bv the number -- —- -— , i.e., -rr-r, =— , etc., are the implicit 

' Price m base year P 0 P 0 

weights. Alteration of base year to a year with a different set of prices, 
therefore, is simply equivalent to introducing a fresh set of weights* 
with the non-reversibility of the index as a natural result. 

The weights implicit in the above table when Y 0 — 100 are: 

Commodity 

A 

B 
C 
D 

whereas when Y., — 100 they ; 

A 
B 
C 
I) 

With Y 0 as base, therefore, the index number tells us the relative cost 
in subsequent years of purchasing 50 units of A, 40 of B, 33J of C and 
100 of D ; with Y 3 as base, the information obtained relates to the 
comparative cost in other years of 40 units of A, 33^ of B, 30*8 of C 
and 66J of D, 

It is therefore quite obvious why a change of base year causes an 
alteration in the proportions between the indices of the several years. 


Weight. 
100 
9 

100 
2*5 
100 
3 

100 

1 


= 50 


40 


33J 


= 100 


100 

2/6 

100 

3/- 

100 

3/3 

100 

1/6 


40 


- 33J 
* 30-77 



78 INTRODUCTION TO STATISTICAL CALCULATIONS 


The well known " Sauerbeck ” or " Statist ” index number of 
wholesale prices is the arithmetic mean of the price relatives of forty- 
five commodities. 


Simple Geometric Mean of Price Relatives 
Example V. From the data of Table A. 


Commodity 

Y, 

Price Relatives 

Y, Y 2 

^ 3 * 

A 

100 

125 

112-5 

125 

B 

100 

120 

no 

120 

C 

100 

m 

911 

108 1 

, D 

100 

75 

125 

150 

I.N. = G M - 

100 

98-4 

109-1 

124-9 

The index number for Y x =- 

$ 125 x 120 x 83J x 

75 -= 98- 

Example VI. Using Y 3 
Commodity 

as base. 

Y, V, 

Y,. 

Y, 

A 

80 

100 

90 

100 

B 

831 

100 

91 § 

100 

C 

02 A 

76-92 

84 62 

100 

D 

061 

50 

83J 

100 

GM = 

80 04 

78 74 

87-34 

100 


Note that 100 : 98*4 . 109-1 . 124-9 = 80-04 * 78-74 . 87-34 100 
i.e., 1 : 0-984 1-091 • 1-249 - 1 : 0-984 : 1-091 : 1-249 

that is, an index number which is a simple geometric mean of price 
relatives does satisfy the time reversal test. 

The reason for this becomes clear when the index number is 
written down symbolically. 

With three commodities and using the above symbolism with Y 0 
as base the G.M. for Y x 



X 


iooiv 

p" 

r o 


loop 


in 

1 



With Y x as base the G.M. for Y 0 




INDEX NUMBERS 79 

but the product I x l x is clearly 10,000, since the values under 
the radical signs are reciprocals. 

100 :1 = I x : 100 

i.e., the index is reversible. 

This result is perfectly general. When the average used is the 
geometric mean I x l x is always 100 2 . 

Writing down the simple arithmetic mean of price relatives 
symbolically we obtain: 

100 /P ' P " P UK 

Y 0 as base: Index for Y 1 = — {jn + fV/ + • • • + = I 

100 /P ' p " P n \ 

Y, as base: Index for Y o — (pr> + + • • • + p^) = I,- 

Then 100: I ^ I x : 100 unless the product of the brackets = w 2 , 
a condition which obviously could not obtain generally. 

The geometric mean of price relatives is used in the construction 
of the important Board of Trade Index Numbers of Wholesale Prices. 51 ' 
One advantage of using a “ reversible ” type of index number is 
, that a series of indices on any year as base can be transferred to 
any other year of the series as base by dividing throughout by the 
same number, e.g. * 


Year. 

Index (G M.) 

Y„ 

100 

Y x 

98-4 

y 2 

109-1 

Y, 

124-9 


These indices are transferred to Y 3 as base by dividing each index 
by 1*249. See page 78. 

Arithmetic Mean of Price Relatives—Chain Base 

Here the price of each commodity in each year—excepting for Y 0l 
for which the index number is taken as 100—is expressed as a relative 
of the price in the preceding year. The indices obtained for each 
year on the previous year are afterwards “ chained ” together to 
get another set of indices referring each year back to Y 0 . 

* The Board of Trade Index Number of Wholesale Prices, referred to above, 
has now been superseded. New series of indices of a different character, based 
on J une 30,1949 = 100, have appeared in the Journal since June 1951. Details 
of the changes are given in the issues of May 19 and June 16, 1961, 



80 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example VII. From the data of Table A. 




Price Relatives, 


Commodity 

Y 0 

Y, 

Y, 

Ys- 

A 

100 

125 

90 

1111 

B 

100 

120 

9H 

109^ 

C 

100 

83* 

110 

118 A 

D 

100 

75 

166| 

120 

Total: 

400 

403J 

458J 

458-384 

Average : 

100 

100-8 

114-58 

114-60 


Chain Index 100 100-8 


The “ averages ” above have therefore the following significance: 
If I.N. for Y 0 - 100, I.N. for - 100-8 

Y x = 100, „ Y 2 -- 114-58 

„ Y 2 = 100, , Y s - 114-60 

The chain indices are obtained as follows * 

Y x already has Y 0 as base, and so needs no correction. 

Let I 01 indicate the index number for Y x on base Y 0 

,, 1 12 >» >» Y 2 ,, Yj, etc. 

Then for common base Y 0 : 

I N for Y, - I 01 

.j Y 2 I 01 / I 12 — 100 

,, Y, — I 01 x Ii.> X La — 100 2 , and so on. 

Y 2 - 100-8 x 1-1468 = 115-5 
„ Y 3 - 100-8 X 1-1458 X M46 - 132-36 


Index Number where the Price Relative of Each Commodity is 
Weighted with a Number Equal, or Proportional, to the Total 
Expenditure on that Commodity in the Base Year, i.e., a 
Weighted Average of Price Relatives 


With Y 0 as base, the index number for Y x 

= ioo{ p <*' Qo P^+ p o''Qo"^vL Po"'Qo" p-VH- ■ 

1 ° b'Qo' + p°o"Qo" + Po'"Qo°'" + • 


. . I 

-j 


= 100 


= 100 


S(PoQo) 

W; 

2Q 0 Po' 


/QojV + Qq"P i" + Qo" ,p / " 
VQo'Po' + Qo"P„" + Qo'"Po"' + 




INDEX NUMBERS 


81 


This is the same result as that given for the weighted aggregate 
of prices on page 75. The difference in method is that in the 
aggregative method the weights are the physical quantities and 
they are applied to the prices directly, whereas in the above case 
the weights are the total outlay on each commodity in the period, 
applied to the price relatives, or, what amounts to the same thing, 
to the percentage increase in price of each commodity as compared 
with the price in the base period. 

Example VIII. From the data of Table A. 


Commodity 

Weight 
(Bo X Q 0 ) 

^ i 

Price relatives. 

Y, 

Y s 

A 

6 

125 

112-5 

125 

, B 

10 

120 

110 

120 

C 

6 

831 

91f 

1081 

D 

1 

75 

125 

150 

Then index for Y 

23 

i 





2525 

■= {(6 X 125) + (10 X 120) + (6 X 834) + (1 X 75)}/23 = — 
109-78. 

Example IX. It is usually simpler to deal with the percentage 
increase in price directly as follows: 


(0 

(») 

(«*) 




% increase m price 
in V i as compared 

Product: Col (ii) 

Commodity 

Weight 

with Y 0 

by Col. (lii) 

A 

6 

25 

150 

B 

10 

20 

200 

C 

6 

-16f 

- 100 

D 

1 

-25 

- 25 


Sw - 23 


-f-225 

Average percentage increase 

= 225/23 == 

9-78 

With Y 0 - 

100, index number for Y, — 109-8 


Similarly, 

” 

Y a - 106-4 

Y s - 119-6. 



The Ministry of Labour Index Number of Retail Prices based on 
July 1914 was of this nature, i.e., it was a weighted average of price 
relatives. The new Ministry of Labour “ Interim Index of Retail 
Prices ” has as base date June 17, 1947. It has also a new set of 
weights, based on an inquiry into working-class family allocation of 
income undertaken in 1937-38, and a somewhat different grouping* 



82 INTRODUCTION TO STATISTICAL CALCULATIONS 


The old and the new groupings and weights are set down below for 


comparison. 

Ministry of Labour Retail Price Index Number 

(“ Cost-of-living Index ”) 



Weights. 

Group. 

1914. 

1937-38 

(proposed). 

I Food .... 

60 

39 

II. Rent and Rates 

16 

13 

III. Clothing .... 

12 

11 

IV. Fuel and Light 

8 

7 

V Miscellaneous . 

4 

30 


100 

100 

Ministry of Labour Interim Index of Retail Prices* 

Group 


1947 weights 

1. Food .... 


348 

II. Rent and Rates 


88 

III. Clothing . 


97 

IV. Fuel and Light 


65 

V. Household durable goods . 


71 

VI. Miscellaneous . 


35 

VII. Services .... 


79 

VIII. Drmk and Tobacco . 


217 



1000 


[Ministry of Labour Gazette] 

Broadly speaking, the Ministry of Labour Index shows the cost 
of purchasing, at any subsequent time, a “ parcel ” of goods and 
services that would have cost £100 at the base date. 

It has been shown that the “ weighted aggregate of prices ” 
method and the " weighted average of price relatives ” method are 
alternative ways of arriving at the same result. Which of them will 
be used in any case depends on the form in which the data are 
available. If the quantities and prices in the base period and prices 
at each subsequent period are known, the aggregative method 
should be employed. If, however, only price relatives for each 
period and total expenditure in the base period are given, the other 
method must be used. 

* Important changes have been made in this Index Number It is now 
based “ on the 1950 pattern of consumption relevant at January 1952 prices". 
For a description of the New Index see the Ministry of Labour Gazette for 
March, 1962. 





INDEX NUMBERS 


83 


Miscellaneous Examples 

Example X. In calculating the Ministry of Labour Cost-of-living 
Index the following weights were used: Food 74, Rent 2, Clothing 1 
Fuel and Light 1, Miscellaneous 

Calculate the index number for a date when the percentage increases 
in prices of the various items over prices of July 1914 (= 100) were 
21, 57, 90, 70 and 78 respectively. 


(1) 

(2) 

(3) 

(4) 



% increase over 

Product: Col. (2) 

Item 

Weight. 

July 1914 

by Col (3). 

Food 

7* 

21 

157*5 

Rent 

2 

57 

114 

Clothing . 

• 4 

90 

135 

Fuel 

l 

70 

70 

Miscellaneous 

i 

78 

39 


12 3 


515*5 


Average % increase - 41-2 

IZ o 

Index number — 141. 


Example XI. On a certain date the Ministry of Labour Retail Price 
Index was 204*6. Percentage increases in paces over July 1914 were: 
Rent and Rates 65, Clothing 220, Fuel and Light 110 and Miscellaneous 
125. What was the percentage increase in the Food Group? 


Group 

° 0 mciease. 

Weight 

Products. 

Food 

p 

60 

p 

Rent and Rates 

65 

16 

1040 

Clothing . 

220 

12 

2640 

Fuel and Light . 

110 

8 

880 

Miscellaneous 

125 

4 

500 


100 


Since the index number was 204*6, the average percentage increase 
over July 1914 was 104*6; therefore the total of the “products" 
column must be 104*6 X 100 — 10460. 


^ „ 10460 - 5060 

/. Percentage increase m food prices =-—- 

- 5400/60 = 90. 




84 INTRODUCTION TO' STATISTICAL CALCULATIONS 




Example XII, 

Statist Index Number 
Average 1867-77=100 

No of Index in 


Commodity group 

entries. 

1913. 

1938. 

Food : 

Cereals, etc 

8 

69 

81 

Meat 

7 

99 

111 

Sugar, etc 

4 

54 

43 

Materials: 

Minerals 

7 

111 

136 

Textiles 

8 

84 

75 

Sundry . 

11 

83 

87 


Compute the index numbers for all groups for 1913 and 1938. 
Express the index m each group in 1913 as 100, compute the 
corresponding number for 1938 and recalculate the index for 1938. 

Do the two computations agree as closely as you would expect ? 

(L.U. Inter. Com.) 


1913. Since mean of 8 items = 69 

and ,, 7 ,, = 99, etc. 


Mean of 45 items 


= {(69 X 8) + (99 X?H (54 x 4) + (111 x 7) + (84 X 8) 

+ (83 X 11)}/45 


3823 

45 


84-95. 


Say 85-0. 


Similarly index number for 1938 


4106 

“ 45 ~~ 

91-24, say 91-2. 

Index in 


Group. 

No of items 

1913. 

1938. 

1 

8 

100 

117-4 le. 

O 

jj 

7 

100 

112-1 

3 

4 

100 

79-6 

4 

7 

100 

122-5 

5 

8 

100 

89-3 

6 

11 

100 

104*8 


45 

Complete index for 1938 on 1913 as base 

= ^-{(117-4 X 8) + (112-1 X 7) + . . 
= 106-9. 


. + (104-8 X 11)} 



INDEX NUMBERS 


85 

I f On base 1867-77 Index for 1913 Index for 1938 
I = 85 0 91 2 « 1 1 073 

1913 « 100 105 9 = 1 1 059 

f The agreement is thus fairly close, but it is quite fortuitous If the 
reader will work out the index number for 1913 on 1938 as base, he will 
get the following results 

91 85 ^ 100 96 
i e 1 0 93 1 0 96 

In general, greater discrepancies are to be expected 
Example XIII Data of Example XII 

Calculate the index numbers for all groups for 1913 and 1938—av 
1867-77 = 100—on the assumption that a simple geometric mean is 
the form of average used 

Index for 1913 - y^bfl 8 x 99 7 ✓ 54 4 x 111 7 X 84* x 83 11 } 

1 e , log mdex ^{(S log 69 -f 7 log 99 1-11 log 83} 

1 9207 

l e I N = 2 = 83 3 

log index for 1938 33(8 log 81 -f 7 log 111 -f | 11 log 87) 

le.IN ^ 87 5 

Example XIV 

Price Relatives of Commodities 


\ car 

A 

B 

C 

1934 

100 

100 

100 

1935 

105 

97 

121 

1936 

no 

94 

125 

1937 

115 

100 

130 

1918 

116 

99 

128 

1939 

120 

105 

130 


(а) Recalculate the above table so that the price relatives in each 
5 ear refer to the previous year as 100 , and calculate the index for the 
three commodities for each year 

( б ) Cham the above indices so that the index for each year is referred 
to 1934 as base 

(c) Obtain the simple A M mdex for each }ear on 1934 as base direct 
and compare results with those of ( b) 


Price Relatives of Commodities—Cham Base 


Year 

A 

B 

C 

AM 

Cham index 

1934 

100 

100 

100 

100 

100 

1935 

105 

97 

121 

107 7 

107 7 

1936 

104 8 

96 9 

103 3 

101 7 

109 5 

1937 

104 5 

106 4 

104 

105 0 

115 0 

1938 

100 9 

99 

98 5 

99 5 

114 4 

1939 

103 5 

106 1 

101 6 

103 7 

118 6 



86 INTRODUCTION TO STATISTICAL CALCULATIONS 


Price Relatives: 


1936 

A 110 X 100 

^ 94 X 100 

C — 

125 X 

100 

105 

- 97 


121 



= 104-8 

- 96-9 

= 

103-3 


Chain Indices : 






1936 

T , 107-7 x 101-7 

Index = -——- = 109-5 




1937 

- 109-5 X 

1-05 - 115-0 




1938 

= 115-0 X 

0-995 ■= 114-4 





1934 

1935 1936 

1937 

1938 

1939 

Indice’s referred 

to 1934 as 





base directly . 

100 

107-7 109*7 

115*0 

114-3 

118-3 

Cham Indices 

100 

107-7 109-5 

115 0 

114-4 

118 6 


The agreement is unusual; generally speaking, there is a considerable 
divergence: the farther it is from its base year the less reliable is the 
chain base likely to be. 

The reader should make the same comparison with other sets of pnce 
relatives. 


II. Quantity Index Numbers 

A quantity index number is intended to measure increases or 
decreases in the physical quantity of a certain type of goods 
imported or exported, or in the production of a certain group of 
commodities in a country, or in the total physical production of a 
factory, or associated group of factories making a variety of 
commodities. 

Let Q 0 ', Q 0 " . . . Q 0 n be the quantities of n commodities pro¬ 
duced in the base period, Y 0 for example, and Q/, Q r " . . . Q r n , 
the quantities of the same commodities produced in Y r , as in the 
following table. 


Annual Production m Million Yards 


Commodity. 

Y 0 

Y,. 

Y, 

Y s . 

A 

160 

200 

220 

250 

B 

10 

12 

15 

20 

r 

80 

100 

110 

120 


Example XV. 

Simple Aggregate of Quantities 

Totals: 250 312 345 390 million yards 

LN. : 100 124-8 138 156 




INDEX NUMBERS 


4 


This index merely tells us that—yards of A, B and C being regarded 
as equally important—yardage has gone up 24*8, 38 and 66% 
respectively in Y 3 , Y 2 and Y s as compared with Y 0 . 


Example XVI. 

Relatives of Quantities Produced m 


Commodity 

Y 0 . 

Y,. 

Y, 

Y s 

A 

100 

125 

137-5 

156-25 

B 

100 

120 

150 

200 

C 

100 

125 

137-5 

150 

Totals : 

300 

370 

425 

506-25 

A.M. = I.N = 

100 

123-3 

141-7 

168-75 


IQOyQx jjp yQz. 
Qo' * 


Here, the index for any year equals 100 plus one-third of the per¬ 
centage increases for that year, each percentage increase being taken 
to be of the same importance as the others, irrespective of the quantity 
to which it refers. 

Neither of the above results can be regarded as having much 
merit, and it is obvious that some form of weighting is needed if the 
index is to have much significance. 

Numbers proportional to the yardages may be suggested, e.g*, 
16 : 1 : 8; but if these weights are applied to the relatives, the 
result is merely to obtain the indices of Example XV in a more 
roundabout way. 

It may be stated that the only satisfactory method of weighting 
is (a) to use the unit prices as weights applied to the quantities 
or ( b) to use total values as weights applied to the quantity 
relatives. 

The prices or values in the base year—Y 0 = 100—may be adopted 
as weights, as may those of any other year; that is to say, it is 
not necessary that the time base year and the weights base year 
should be the same. 


Example XVII. Suppose the following additional information be given: 



Total value, m pro¬ 

Average price per 

Commodity. 

duction in year, Y 0 . 

unit m year, Y 0 . 

A 

^80-0 million 

10 5 . 

B 

12-5 „ 

255. 

C 

48-0 „ 

125. 



88 INTRODUCTION TO STATISTICAL CALCULATIONS 


Then the aggregative method gives the following result: 
Commodity. Q 0 x P 0 . x P 0 . 

A 160 x 10 =* 1600 200 x 10 = 2000 

B 10 X 25 = 250 12 x 25 = 300 

C 80 X 12 = J)60 100 x 12 = 1200 

2810 3500 


Then quantity index for Y v Y 0 = 100 


SQoP 


3500 

2810 


124-55. 


Applying the relative method we get: 


Commodity 

Relative for Yj. 

Weight proportional 
to value in Y 0 . 

Product. 

A 

125 

8 

1000 

B 

120 

1 * 

150 

C 

125 

4-8 

600 



14-05 

1750 


I.N. for Yj 


1750 

14-05 

124-55 


Similarly, I.N. for Y a = 138-6 

and „ Y 3 - 158-0. 


It should be noted that the information given by the index for Y v 
viz. 124-55, is that if Y 1 production be valued at Y 0 prices its total 
value compared with that of Y 0 production at Y 0 prices is as 124-55 : 100. 

Example XVIII. Calculate a number which will indicate the per¬ 
centage change in volume of traffic (Oct. 1929 = 100) from October 
1929 to October 1930, when account is taken of the relative values 
of the different kinds of traffic. (Inst. T.) 


Tons ('000). Receipts (^’000). 

Type of traffic. Oct 1929 Oct 1930. Oct 1929. 

(i) Merchandise . 1246 1206 776 

(h) Minerals ... 1125 981 252 

(iii) Fuel .... 4794 4229 562 


% increase in tonnage 



in Oct. 1930. 

Weights. 

Products. 

(0 

- 3-2 

776 

- 2,483-2 

<«) 

-12*8 

252 

- 3,225-6 

:«») 

-11-8 

562 

- 6,631-6 



1590 

-12,340-4 


r, . • 12,340-4 

Percentage increase —-— = — 7-76 

° 159o 

/. I.N. = 92-2. 



INDEX 


NUMBERS 


It is not necessary to work out the quantity relatives, 
to proceed as follows: 


(1) Merchandise 

(2) Minerals 

(3) Fuel 


776 v 
1246 X 
252 

1125 X 


1206 = 751 1 
981 == 219-8 


=- 495-7 


It is simpler 


1466-6 


Then Q 01 - SP^/EP^o x 100 
- 1466-6/15-9 - 92-2 

Laspeyre and Paasche Indices 
Example XIX. Consider the following table: 


Quantity (’000 tons) Value (/'000). 


Commodity. 

1937 

1940 

1937. 

1940. 

A 

350 

400 

378 

480 

B 

200 

180 

260 

360 

C 

140 

200 

70 

220 

D 

80 

100 

100 

140 




808 

1200 




W 

(u) 


Calculate average, or unit, prices in both 1937 and 1940. 

Unit prices (£). 

1937 1940 

1-08 1-20 

1-30 2 00 

0-50 1*10 

1-25 1-40 

Then calculate the value (a) of 1940 production at 1937 prices and 
(b) of 1937 production at 1940 prices. 

1940 quantities at 19o7 quantities at 
1937 prices {£’ 000) 1940 prices G£’000). 


A 432 420 

B 234 400 

C 100 154 

D 125 112 

891 1086 

(iii) (iv) 


The four totals of columns marked (i), (li), (iii), (iv), have "e 
following meanings: 

(i) 2Q # Po; (ii) SQrPr; (iii) SQiP.: 


(M SQA 



90 INTRODUCTION TO STATISTICAL CALCULATIONS 


and from them can be obtained four Index Numbers as follows; 

SQoPj _ 1086 


(a) 100 
i.e., 


i.e., 


Wo 

Poi : 


{b) 100 S&P, 


8-08 

134-41 price index for 1940 with 1937 
891 


100 . 


i.e., 

i.e., 


100 


w 100 ^; 


12-00 

Pio = 74-25 price index for 1937 with 1940 = 100. 
EQiP 0 891. 

Wo " 8-08 

Q 01 = 110-27 quantity index for 1940 with 1937 = 100. 
1086 

quantity index for 1937 with 1940 = 100. 


12-00 

90-5. 


The above four indices are of the weighted aggregative type with 
base year weighting: the denominators are either £Q 0 P 0 or SQjP^ 
They are known as the " Laspeyre ” type of index. 

The reader will note, however, that the following results, obtained 
from the same four totals, would appear to be equally admissible as 
price and quantity indices, though with a somewhat different meaning. 
They are known as “ Paasche ” indices, and are symbolized by p«i', 
Q 01 ', etc., to distinguish them from the Laspeyre type. 

p ' - X 100 P ' — ^0?^ X 100 

Ffl1 s Qip« x 1U ’ Fi ° ~ x 100 

Qoi' - x 100, etc. Q 10 ' = Jo x 100 


Comparison of Laspeyre and Paasche Price Indices 

Roughly, P 01 gives the cost of maintaining the base-year rate of 
consumption at the current-year prices, compared with the base-year 
cost, while P 01 ' tells what it would have cost in the base year to have 
consumed current year quantities, compared with current year cost. 

Note that 


♦ 


P.1 X Q 01 ' 

iooHJiL. 

SQ 0 P 0 


Poi X Q, 


01 


. ?S«Pi v 5 QjPi - 

EQ 0 P 0 x SQoP, “ SQ 0 P 0 

value index for current year as compared with 
base year. 

= Voi 

= V„. 



INDEX NUMBERS 


91 


Example XX, Consider the following extract from Board of Trade 
table of the external trade of the United Kingdom: 



Retained Imports 



Cfmn.) 

Volume index 

Year. 

Declared value 

(1938 = 100) 

1938 

858 

100 

1939 

840 

97 

1940 

1126 

94 

1941 

1132 

82 


The volume index is obtained by revaluing current-year quantities 
at base-year prices and expressing the result as a percentage of the 
value of the retained imports in 1938. 

/1939 quantities at 1938 prices\_ AA 

,e„ vol. index lor 1939 = 

x 100 

The " standardized value ” of 1939 quantities 

« 1939 quantities at base year (1938) prices. 


The standardized values of retained imports, m the above table, can 
therefore be obtained as follows: 

Standardized value for 1939 — 858 x 97/100 
1940 = 858 x 94/100 

, pf: 

SQ*P. =- SP„Q„ x 

Instead of giving “ declared values ” and “ volume indices ", a 
common device is to state the declared values of retained imports, 
e.g., over a number of years, and the standardized value for each 
year. From such a table it is easy to obtain (a) a volume index and 
(6) an index of average values for each >ear, as the following 
illustration shows. 


Example XXI. „ . , r , . 

r Retained Imports 


i£mn.) 


Year 

Declared value 

Value on basis of 
1938 values 

1938 

858 

858 

1939 

840 

832 

1940 

1126 

807 

1941 

1132 

704 


[Board of Trade Journall 



92 INTRODUCTION TO STATISTICAL CALCULATIONS 


Re-write the above table in symbols. 

(Col. 1) (Col. 2) (Col. 3) 

Year Declared value. Standardized value. 

1938 £P 0 Q 0 SP 0 Q 0 

1939 EPiQi EPoQ, 

1940 SP a Q 2 SPoQ, 

1941 £P 3 Q 3 2P 0 Q 3 

We can then obtain: 

(1) ^4 Laspeyre Volume Index by dividing the standardized 
value of each year in Col. 3 by EP 0 Q 0 . 

(2) A Paasche Index of Average Value by dividing the entry m 
Col. 2 for any year by the entry for that year in Col. 3 

832 

Thus Vol. index for 1939 = 100— — 97-0 

858 

Price, or Av. Value, index for 1939 = 100— 101*0. 

832 

The reader should work through the following more realistic calcula¬ 
tion. His answers may not quite agree with those here stated, as the 
latter were obtained from more decimal figures than are here given. 

Example XXII. 

Uganda-Kenya Export Trade (principal Commodities ) 

(Figures from Board of Trade Journal) 

J anuary-September 
1947 1946. 

Value Value 


Commodity. 

Unit Quantity G£’000) 

Quantity. (^’000). 

I. Raw cotton . 

cental 

869,047 6,090 

874,552 5,614 

II. Coffee, raw 

cwt. 

491,074 2,029 

693,162 2,231 

III. Sisal fibre and tow . 

ton 

18,002 945 

19,486 621 

IV Tea 


cwt. 

80,186 829 

59,875 407 

V. Cigarettes 

and to- 




bacco 


lb. 1,574,900 570 

1,854,655 581 

VI. Hides, dry, and dry 




salted 


cwt. 

56,174 479 

32,355 161 




10,942 

9,615 


Price 

per unit. 

1947 quantities 

1946 quantities 

Commodity. 

1946. 

1947. 

at 1946 prices 

at 1947 prices. 

I 

6*41929 

7-00768 

5578*66 

6,128*58 

' II 

3*21858 

4-13176 

1580*56 

2,863*98 

A 111 

31*86903 

52*49417 

573*71 

1,022*90 

• IV 

6*79749 

10 -33846 

545-06 

619*02 

V 

0*313266 

0*361928 

493*36 

671*25 

VI 

4*97605 

8*52708 

279*52 

275*89 




9050*87 

11,581*62 



INDEX NUMBERS 


9$ 


Then: 


100 

100 


SQqPi 

SQ 0 Po 

SPoQx 

SP 0 Qo 


11,581-62 
~'96*15 
9050-87 
96-15 


= 120-45; 
- 94-13, 


SQ X P 0 9050-87 
EQiPi 109-42' 
SP X Q 0 _ 11,581-62 
ZPjQx 109-42' 


82-71 

105-84 


Example XXIIL 


1938 

1946, 1st quarter 
1946, 2nd quarter 


United Kingdom Exports 


Monthly averages 
^30,437,000 
50,778,000 
63,212,000 


Volume index. 
100 
91 
110 


[Board of Trade Journal ] 


Estimate the price index—1938 = 100—for the first and second 
quarters of 1946. 


1946—First quarter: 


Value of this volume at 1938 prices = 
Price index for first quarter — 


^30,437,000 X 0-91 
50,778 x 100 
30,437 x 6-91 
183. 


1946—Second quarter: 

Value of second quarter quantity at 

1938 prices = p0,437,000 X M 

Price index for second quarter = 

^ 30,437 X 1-1 

- 189 


Example XXIV. 

United Kingdom Export of Domestic Produce and Manufactures 
(Monthly Averages or calendar months) 

1938. 3rd Quarter 1947 

Vol. index, 

Value (f'QGO). Value (£'000) 1938 ** W0 


I. Food, drink and tobacco 

2,991 

5,894 

86 

11. Raw materials and articles 

mainly manufactured . 

4,743 

2,600 

21 

III. Articles wholly or mainly 

manufactured 

30,437 

89,335 

131 

IV. Animals not for food 

57 

337 

148 

V. Parcel post 

1,002 

2,717 

120 


39,230 

100,883 • 




94 INTRODUCTION TO STATISTICAL CALCULATIONS 


Estimate the average change from 1938 to the third quarter of 1947 
in (i) the total physical volume of U.K. domestic exports and (ii) the 
overall export price level. 

(L.U. B.Com. 1948) 

[.Monthly Digest of Statistics ; Source : U.K. Central Statistical Office] 


Note: 

^39,230,000 = EP o Q 0 ^2,991,000 = EP 0 'Q 0 ' £'5,894,000 - SP/Q/ 
^100,883,000 = SPjQi £4t, 743,000 - 2P 0 "Q 0 " A600,000 - SP/'Q/' 

etc. etc. 


VP '() ' Vp "O " 

Vol. index 86 = 100—-7; 1 , Vol. index 21 — 100 etc. 

0 vo o Vo 


Hence: EP 0 'Qi' = 2,991 x 0-86 - 2,572-26 

XPf'Qi" -= 4,743 X 0-21 -- 996-03 

ZPq'"Ql'" *=30,437 X 1-31 39,872-47 

SP 0 IV Qi lv - 57 x 1-48 =. 84-36 

SP 0 v Qi V - 1,002 x 1-20 - 1,202-40 


(i) 


Volume index 

- X 100 = 100 \ 

^1 0 V 0 


44,727-52 - SP 0 Q X 


44,727 

397230 


114-0 
- 114. 


(ii) Overall export price level 


SP x Qi 1AA 100,883 
TTrr^r x 100 = — x 

XP 0 Qi 44,727 


100 - 225-55 
— 226. 


The Marshall-Edgeworth Type of Index 

(a) Price Index. The weights are the total (or average) quantities 
of the base year and the current year and are applied to the prices. 


. _??!$«+ Ql) 
01 _ SPofQo+ &) 


X 100. 



Volume Index. And similarly: 

o _ gQiffo + Pi) 

y#1 ~ SQo(Po + Pi") 


X 


100 . 


Fisher’s “ Ideal ” Index 


P 


01 



EPtQx 

sPoQf 



INDEX NUMBERS 95 

This index is called “ ideal ” because it satisfies both the " time ” 
and " factor ” reversal tests. On the other hand it is difficult to 
say what, precisely, such an index measures. 

The factor reversal test is satisfied if 


Pot X Qoi V 01 


__ ^PiQi 
" ^PoQo* 


The Laspeyre price index number for year n with Y 0 as base is 
given by x 100 P 0 „. 

All the indices in a series of years are related to the same base, and 
any two years of the series are comparable. 

The Paasche price index number for year n is given by 

^P ?? Q " y inn __ p ' 
sPoQ* x 100 “ ^ on • 

This type, therefore, involves the use of a new set of weights 
every year, or other period. Thus, in addition to the practical 
difficulty of ascertaining reliable weights, it has the disadvantage 
that while P 01 ', P 02 ', etc., are directly comparable with Y 0 , they are 
not directly comparable with one another. 

Since the Marsh all-Edgeworth index also has shifting weights, it 
suffers from the same disabilities as does the Paasche index. 


Exercise 

1 The expenditure of a certain business on materials can be grouped 
under ihree main headings in the ratio of 6.5 3 If average prices in these 
groups rise by 42, 35 and 28%, respectively, by what percentage is expendi¬ 
ture on materials increased if the same amounts as before are purchased ? 

2. When the cost of tobacco was increased by 50% a certain hardened 
smoker who maintained his former scale of consumption said that the rise had 
increased his cost of living by 5% What percentage of his cost of living 
was due to buying tobacco before the change in price ? 

3. If the wages of a group of workmen are increased by 40% and the cost 
of living rises by 25%, how much greater is their purchasing power than 
before the changes took place ? 

4. (a) The tax on a certain dutiable article was reduced by 20%. The 
revenue derived from the tax was thereby increased by 15%. By what 
percentage did the number of articles imported increase ? 

( b) In a certain year the turnover of a company was 10%, the expenses 
were 5% and the net profit was 30% in excess of the corresponding figures lot* 



% INTRODUCTION TO STATISTICAL CALCULATIONS 

the previous year. What percentage of the turnover was the profit in the 
first year ? 


5. 



Prices 



Commodity. 

1934. 

1935 

1930. 

1937 

1938. 


s d 

s d 

s d 

s d 

s d 

(i) 

6 

8 

9 

10 

H 

(n) 

4 

3* 

5 

6 

H 

(in) 

1 1 

1 0 

1 2 

1 3 

l l 

(iv) 

9 

11 

10 

11 

1 0 

(v) 

Compute— 

2 3 

2 7 

2 9 

2 10 

2 8 


(a) A simple arithmetic mean of price relatives index 

(b) A simple arithmetic mean of price relatives index on the chain 
base method 1934 ~ 100 

6 Compute an index number to show average variation in the production 
of a group of commodities A, B, C, D 



Units produced ('000) 

Cost 

Commodity 

Qo 

Qi 

Q. 

c. 

A 

56 

75 

82 

U5 

B 

24 

32 

40 

2-6 

C 

156 

127 

115 

1-2 

D 

84 

65 

72 

10 


Q 0 ~ quantity produced in year Y 0 = base year 
Qi= „ .. Y„ etc 

C 0 = number proportional to value at cost m year Y 0 
= “ weight." 


7 Calculate a price index from the following data based on the Simple 
G.M Use logarithms 


Commodity. 

How quoted 

Average price 
1930 (base year) 

Average price 
1938 

A 

Pence per lb 

16*1 

34*2 

B 

£ per ton 

9*2 

8*7 

C 

Shillings per oz. 

15T 

12*5 

D 

Pence per yd 

5-6 

4*8 

E 

Shillings per gall 

11*7 

13*4 

F 

Percentage on base figure 

100 

117 


Now reverse the process, taking 1938 as base year and 1930 as current 
year and show that the two results are strictly consistent. 

(Soc. I.A.A ) 

8 . (a) A certain—fictitious—index number of wholesale prices, based on 
the simple arithmetic mean of price relatives, comprises 40 items These 
are divided into 7 groups A separate index is published for each group and 
also an index for the whole 40 items. Find the index number for all the 
groups combined, for 1938, from the following data 



INDEX NUMBERS 


97 



No. of items 

Group Index 

Group. 

in group. 

for 1938. 

A 

10 

120 

B 

5 

95 

C 

8 

115 

D 

4 

142 

E 

3 

86 

F 

4 

100 

G 

6 

105 


40 

(b) Calculate the index for 1938 for all the groups combined from the above 
data on the supposition that it is based on the simple geometric mean. 

9 Canada Index Number of Industrial Production (1926 — 100) 


# 

1938 

1939 

1940. 

Weights. 

(i) Mining 

195 

215 

238 

1M 

(n) Manufacture . 

107 

119 

147 

67-5 

(m) Electrical Power 

218 

238 

252 

5*6 

(iv) Building . 

52 

51 

88 

15*8 


100-0 

(a) Obtain the general index number of industrial production for the three 
years, 1926 — 100 

(b) Obtain the index numbers for 1939 and 1940 on 1938 as base 

(L U Inter Com ) 

10 Price Index of Materials and Labour 



Price Relatives (June 1914 = 

100) 



1914 

1915 

1916 

1917 

1918 1919. 

Lead 

98 

120 

179 

235 

191 146 

Pig Iron 

99 

104 

150 

278 

256 223 

Labour 

98 . 

101 

114 

129 

160 185 


1920 

1921 

1922. 

1923 

Weights 

Lead 

209 

116 

145 

193 

3 

Pig Iron 

. 329 

174 

186 

203 

60 

Labour 

222 

203 

197 

214 

37 


Calculate the weighted index number of materials and labour for each of 
the years ' (L U Inter. Com.) 

11 A and B are two commodities judged of equal importance, and their 
prices are as follows :— 

Year 1 Year 2. 

A. £ per ton . , . 10 15 

B. Shillings per yard ... 3 2 

Work out an unweighted price index taking Year 1 as base, and then work 
it out taking Year 2 as base, Do the results make sense ? If not, give your 
observations (Soc. LA A ) 

D 



08 INTRODUCTION TO STATISTICAL CALCULATIONS 


12. 

Relative expenditure. 

% increase m price 
November 1940 


1914 

1937-38 

(1914 - 100) 

Food .... 

60 

39 

73 

Rent and Rates . 

16 

13 

64 

Clothing 

12 

11 

220 

Fuel and Light 

8 

7 

119 

Other Items 

4 

30 

121 


100 

100 



Compute the Cost-of-Living Index Number in November 1940 (1914 ~ 100), 
using as weights (a) the relative expenditures in 1914 and (b) those in 1937-38 
Comment on the different results obtained 

(L U Inter Com) 

13 Exports of Pottery, etc. 

Quantities (cwt). Value (£). 



1938. 

1942. 

1938. 

1942 

Tiles . 

. 208,300 

278,700 

238,100 

598,800 

Sanitary Ware 

. 424,700 

240,400 

643,900 

766,500 

China . 

20,430 

27,730 

424,400 

742,800 

Electrical Ware 

47,580 

35,440 

151,200 

154,900 

Other Earthenware 

. 500,900 

427,100 

1,999,800 

2,753,500 


Compute, for this group of exports, an index number which gives the 
percentage change in volume of exports from 1938 to 1942. 

(Inst. T 1945) 

14. Describe how a price index number could be computed from data such 
as those in the above question 

(Inst T 1945 ) 


15. Export of Cotton Yarns and Manufactures 

Quantities Value (£) 



1938 

1943. 

1938 

1943 

Cotton yarns : 

A (million lb ) . 

110-0 

16-37 

8,397,000 

2,817,000 

B (million lb.) . 

12 93 

2-818 

1,278,000 

531,000 

Cotton manufactures : 

A (thousand sq. yds ) 

235-3 

69-32 

3,841,000 

2,854,000 

B (thousand sq. yds.) 

421-6 

83-15 

7,776,000 

3,319,000 

C (thousand sq yds ) 

281-9 

102*9 

7,152,000 

5,805,000 

D (thousand sq. yds.) 

368-8 

90-86 

10,628,000 

6,381,000 

E (thousand sq. yds.) 

78-58 

28-08 

2,564,000 

1,922,000 


Compute a quantity index number (I) for 1943 on 1938 as base, using 1938 
values as weights and another (I x ) for 1938 on 1943 as base, using 1943 values 
as weights. 

Examine the reasons why the product I x h differs from 10,000 

(L.U B Sc. Econ , 1945) 





16. 


INDEX NUMBERS 
Fuel Produclion m U.S.A. 


99 




Quantities. 

Value m 1921 

Kind of fuel. 

1910. 

1921. 1923. 

(million dollars). 

Bituminous coal (million tons) 

417*1 

415*9 545-3 

1948 

Anthracite coal (million tons) 

84*49 

90*44 95-20 

731 

Oil (million barrels) 

209*6 

472-2 725-7 

712 

Calculate a fuel production 

index number on 1910 as base 





(Inst T.) 

17. Imports into 

Swansea 

Wood and Timber 



Quantity (loads) V 

alucs, 1924 

Description 

1924 

1927 

GO* 

Soft sawn . 

23,324 

30,392 

123,059 

Planed 

8,046 

10,152 

50,975 

Pit props . 

128,603 

97,711 

173,344 

Sleepers 

2,384 

1,085 

12,216 


Make a quantity index number for 1927 on 1924 as base, using the 1924 
values as weights 

(Inst T) 

IS The following were the average annual wholesale prices of selected 
cereal items, 1924 and 1938 


Wheat: 

Unit of price 

1924 

1938 

Variety 1 

Shillings per cwt 

11*6 

6*7 

o 

496 lb. 

56*3 

42*0 

Flour 

Barley : 

280 lb 

44*3 

32*0 

Variety 1 

,, cwt. 

13-2 

10*2 

2 

„ 400 lb. 

41-5 

23*3 

Oatmeal 

/ per ton 

23-5 

20*0 

Maize 

Shillings per 100 lb. 

8*9 

6*5 

Rice 

,, cwt. 

16*8 

10*6 


Calculate an index of cereal prices for 1938 upon the base year 1924, using 
an unweighted G M Then calculate an index for 1924 upon 1938 as base 
and comment on the results. 

(Soc. LA.A ) 

19. Repeat the calculations of Question 17, using a simple A M. 

20. On October 31, 1942, the official cost-of-living index was 100% above 
the level of July 1914. For food alone the index figure was 63% above the 
level of July, 1914. State approximately the index figure for 31 October, 
1942, for all items other than food 

21. Compute simple geometric mean of price relatives index numbers, 
1934 = 100, from the data of Question 5, for the years 1935*38, inclusive. 



100 INTRODUCTION TO STATISTICAL CALCULATIONS 


22 The average of wholesale prices was higher m 1937 than in 1936 by 
15-1%, the index numbers for the two years being 108*7 and 94*4 respectively 
(1930 — 100). This increase followed rises of 6*1, 1-0, and 2*8%, each year 
being compared with the preceding In 1933 prices were the same as in 
1932, but 2*5% below 1931 Prices in 1931 were 12*2% below 1930. 

From these data compute the index numbers for eacli year from 1930 to 
1937. 

(L U. Inter Com., 1938) 

23, In November 1937 increases in weekly wages were recorded for 1,127,000 
persons, amounting to £126,600. In the eleven months of 1937, including 
November, the corresponding figures were 4,928,000 and £723,300 Mining 
accounted for 293,000 and £52,800 in November and for 718,000 and £154,200 
m the eleven months From these data fill m the following table 



Wage Increases, 1937 
J anuary-October 

November 


Number ('000) Average 

Number ('000) Average 

Mining 

425 

293 3*6^ 

Ollier 



Total 




(L U Intel Coin , 1938) 

24. Explain briefly how an index number of production is compiled 
From the following table compute the index number for industry other 
than agriculture, for the years 1933 and 1936 



Relative im¬ 
portance in 

Index nr 

imbers of production 


base year 

1924 

1933 

1936 

Agriculture 

7 

100 

113*5 

119*6 

All Industries (including 
Agriculture). 

40 

100 

107*7 

136*9 



(LU 

Inter Com , 

1938) 


25 Compute quantity and price index numbers from the following data 



Quantity (units) 

Value (£) 


Commodity 

] 938 = Q 0 . 

1946 - Q, 

1938 

1946. 

1 

100 

150 

500 

900 

2 

80 

100 

320 

500 

3 

60 

72 

150 

aso» 

4 

30 

33 

360 

2.97; 


26 Use the following information and the data of Question f*ts» compute 

(a) a weighted aggregate of price index number, 

(b) a weighted arithmetic mean of price relatives indfex* number for 
each of the years 1935-38, 1934 = 100. 



INDEX NUMBERS 


101 


No. of units purchased 
Commodity. per period in 1934. 

(i) 10 

(n) 8 

(m) 6 

(iv) 7 

(v) 4 

27 The total value of retained imports into the U.K. in 1938 was £71-5 mn 

per month. The corresponding total for 1945 was £87-6 mn pei month. 

The index of volume of retained imports in 1945 compared with 1938 (— 100) 

was 62-0. Compute the price index for retained imports for 1945 on 1938 as 
base. 

(Inst T., 1947) 

28 The following series shows for U K total imports (a) the declared value, 
and ( b ) the value on the basis of average values in 1930 : 


U K Total Imports 



Declared value 

Value on basis of 

Year. 

(£mn ) 

1930 values (£mn ) 

1930 

1044 

1044 

1931 

861 

1067 

1932 

702 

939 

1933 

675 

946 

1934 

731 

991 

1935 

756 

1012 

1936 

848 

1077 


[Board of Trade Journal ] 

Taking 1930 as base year, construct index numbers (1) of average values 
and (2) of volume for the years 1931-30 

(R S S. Certificate Specimen Paper) 

Hint —Note that the items in Col 2 are £P 0 Q 0 , £PiQi, etc, and that 
those m Col. 3 are SP 0 Q 0 , 2P 0 Q 1# etc. 

29. The weekly average traffic receipts of the mam-lme railway passenger 
trains and the L.PT.B for 1935-46 were : (£'000) 1935, 1,847, 1936, 1,898; 
1937, 1,954; 1938, 1,972; 1939, 1,967; 1940, 1,998; 1941, 2,526; 1942, 
3,135; 1943, 3,562; 1944, 3,728; 1945, 4,028; 1946, 3,868 The weekly 
average loaded tram miles for the same years (m millions) were respectively ; 
4*99, 5*09, 5-20, 5-28, 4*73, 3*66, 3*69, 3*71, 3*73, 3*67, 3*94, and 4*34 The 
annual average wholesale price index numbers for the same years (1938 = 100) 
were respectively : 87*7, 93*0, 107*2, 100*0, 101*4, 134*6, 150*5, 157*1, 160*4, 
163*7, 166*7, and 172*7. 

Correct the traffic receipts to 1938 values by applying the wholesale price 
index number, and calculate for each year the receipts m £'s at 1938 values 
per loaded train mile. 

Draw time graphs, on the same chart, of (a) the receipts at 1938 values, 
(b) the weekly average loaded tram miles, (c) receipts at 1938 values per 
loaded train mile 


(R.S.S. Certificate Specimen Paper) 



102 INTRODUCTION TO STATISTICAL CALCULATIONS 


30 Partial Census of Production , 1946 

Hats , Caps and Millinery Trade 
(Board of Trade Journal) 

1946 1937 1935 

Net output (£000) . . . 8,717 6,284 6,494 

Average no. of persons employed . 17,704 37,793 39,099 

Net output per person employed (£) 492 166 166 

By applying the following price index numbers to the above data make a 
rough comparison of (a) the net output and (b) the net output per person 
employed, for the three years 

Year I N 

1935 87*7 

1938 100 Base year 

1937 107-2 

1946 172-7 

31 U.S.A. Production 

Production m million Average price m 

bushels. cents per bushel 



1913 

1930 

1913. 

1930 

Wheat 

763 

857 

80 

60 

Corn 

. 2447 

2060 

69 

65 

Oats. 

1122 

1276 

39 

32 

Barley 

178 

304 

54 

39 

Potatoes . 

332 

334 

69 

88 


[U.S Dept, of Agriculture] 

Calculate : 

(i) Total value of 1913 production at 1913 prices 


(u) 

1930 

„ 1930 

111) 

„ 1913 

„ 1930 

iv) 

1930 

1913 


and hence obtain index numbers of price and quantity for the two years 


32. Board of Trade Index No. of Wholesale Prices 




No. of items 

Index for 


Group. 

in group 

Jan. 1948 

I. 

Cereals .... 

20 

178-5 

II. 

Meat, Fikh, Eggs . 

20 

118-5 

III. 

Other Food and Tobacco 

28 

224-7 


Total (Food and Tobacco) . 

68 

173-9 



[Board of Trade Journal ] 


Show that 173*9 is the weighted G.M. of the three group indices. 




INDEX NUMBERS 103 

33 Annual Average Pnces — U.K. 

Prices in 


Commodity. 

Unit 

1935 

1938. 

1944. 

1945. 



s. 

d 

s. 

d 

5 . 

d. 

s 

d 

Carrots . 

cwt. 

4 

9 

9 

2 

14 

5 

14 

9 

1 Onions t 

• 77 

6 

0 

8 

0 

23 

9 

27 

11 

j Cabbage 

dozen 

1 

7 

1 

4 

2 

11 

3 

2 

| Cauliflowers . 

. ,, 

3 

5 

2 

10 

4 

7 

4 

7 

j Brussels Sprouts 

cwt 

15 

9 

15 

9 

31 

3 

29 

6 

| Peas 

• >> 

20 

0 

17 

0 

59 

11 

59 

3 

| Beans . 

• 17 

17 

6 

22 

6 

81 

10 

50 

7 


j [Annual Abstract of Statistics, 1935-46; Source. 

Ministry of Agriculture and Fisheries] 


' ( 1 ) Calculate indices of the simple A.M. of price relatives type for the years 

f 1938, 1944 and 1945, 1935 » 100 

(u) Calculate the simple G M of price relatives for 1945, 1935 ™ 100 
and „ „ „ „ 1935, 1945 = 100 


34 Interim Index of Industrial Production (Average 1946 — 100) 


Group. 

Weights 

Group indices 
June 1947. 

June 1948 

, China and Earthenware 

4 

122 

149 

1 Glass . 

6 

113 

126 

Bucks, Cement, etc. 

21 

138 

160 

t Metal Ferrous 

38 

110 

120 

1 Non-ferrous 

18 

125 

119 

; Precision Instruments 

8 

120 

123 

Leather Goods, etc 

6 

97 

89 

Manufacture of Wood and Cork 

25 

104 

102 

; Paper and Printing 

39 

117 

106 

1 Other Manufacturing Industries 

19 

137 

150 

1 

184 




| [Monthly Digest of Statistics; Source : Central Statistical Office] 

t 

j ( l ) Calculate index numbers of production for the above groups combined 
t at both dates. 

$ 

1 (n) There are 21 groups in the complete index, with total weights 1000 , 

(a) If the complete index for June 1947 was 114, what was the index, 
at that date, for the remaining 11 groups ? 

(b) If the index for the remaining 11 groups m June 1948 was 124*4, 
what was the index, at that date, for the whole 21 groups ? 





104 INTRODUCTION TO STATISTICAL CALCULATIONS 

35. Ministry of Labour Interim Index of Retail Prices 

17 June, 1947 = 100 



% increase m price 


Group 

18 Nov 1947 

Weights 

1 Food .... 


3-2 

35 

2 Rent and Rates 


01 

9 

3. Clothing 


21 

10 

4. Fuel and Light 


6 9 

0 

5 Household durable Goods 


4*0 

7 

6 Miscellaneous Goods 


9*0 

3 

7 Services 


2-2 

8 

8 Drink and Tobacco 


40 

22 




100 

[Monthly Digest of Statistics , 

Source . Ministry of Labour] 

Compute the index number for 18 Nov., 1947 

36 Interim Index of Industrial Production (Average 1946 

= 100) 



% increase on average 1946. 

Industry 

Weight 

In July 1948. 

In Oct. 1949 

Chemicals, etc 

65 

13 

26 

Textiles .... 

55 

17 

35 

Clothing 

38 

4 

20 

Leather, etc 

6 

-18 

23 

Food .... 

60 

10 

22 

Drink and Tobacco 

47 

- 3 

7 

Cork and Wood . 

25 

- 1 

10 

Paper and Printing 

39 

- 2 

17 

Other Manufacturing Industries 

19 

36 

54 

Building and Contracting 

. _92 

21 

28 


446 



[Monthly Digest of Statistics , Source . Central Statistical Office] 

Calculate an index of industrial production for the above group ol industries 


for July, 1948, and for October, 1949. 

Explain what the " weights ” in the above table represent 


37. Exports of Pottery from the U K 




Quantities ('000 cwt). 

Values 

o 

o 

o 

it? 


Type 

Jan -Nov 
1948. 

11/12 of 
1938 

Jan -Nov 
1948 

11/12 of 
1938. 

(i) 

China including trans¬ 
lucent pottery 

63-1 

24-2 

2,672 

389 

(2) 

Earthenware of all other 
descriptions 

736-3 

459-1 

7,929 

1,833 


Totals . 

799-4 

483-3 

10,601 

2,222 


[Board of Trade Journal] 

(i) Find the average value for each type of export in each year. Re-value 
1948 quantities at 1938 values and 1938 quantities at 1948 values. 





INDEX NUMBERS 


105 


(ii) Calculate index numbers of average values and of volume, 

(a) for 1948 on 1938 as base 

(b) „ 1938 „ 1948 

38 U.K. Exports of Manufactures, 1938 and 1949 



1938. 


1949. 


Value as re¬ 

Value as re 

At 1938 aver. 

t Group. 

corded (/mn ) 

corded (/mn 

). values (^mn.). 

jPottery, Glass, etc 

9-6 

48-5 

20-4 

[Iron and Steel Manufactures 

41-7 

126-6 

56-8 

‘Non-ferrous Metal Manu- 




j factures 

12-3 

63-6 

31-4 

[Electrical Apparatus . 

13*6 

79-1 

31-4 

^Machinery 

57 9 

278-7 

119-9 

(Cotton Manufactures . 

49-7 

159-1 

36*2 

Woollen Manufactures 

26*8 

104*2 

35-3 

Other Textile Manufactures 

16*2 

66-8 

191 

Apparel 

8*5 

29-4 

9-6 

Footwear , 

2*0 

5-9 

2-2 

Chemicals 

22*3 

86-1 

35-4 

Vehicles . 

44*5 

313-5 

144-9 

Other Manufactures 

60*2 

196-8 

83-5 

Total 

365-3 

1558-3 

626-1 


[Source * Board of Trade Journal , February 1950] 

What index numbers of volume and average value can be constructed from 
such data? Compute them for 1949 (1938 = 100 ) for each of the following 
. groups of exports : (l) Metal Manufactures, Vehicles and Machinery (excluding 
Electrical Apparatus); (n) Textile Manufactures (including Apparel other than 
Footwear), (ni) Chemicals, and (iv) Manufactures other than (0, (n) and (lii). 

(L U B Sc Econ , 1950) 

39 Industrial Production m Ceitam European Countries 


Country 

Quarterly 

average 

(1947) 

i 

1 st 

qr 

1947 

2nd 3rd. 
qr qr 

4th 

qr 

1 st. 

qr. 

1948 

2nd 3rd. 
qr qr 

4tli. 

qr 

1949 

1 st 2nd. 
qr qr. 


Quar av 












1938 = 100 



Quarterly averages, 1947 = 

100 



U K 

116 

90 

101 

99 

110 

112 

112 

106 

117 

120 

119 

France 

95 

98 

106 

96 

101 

116 

122 

110 

120 

131 

136 

Belgium 

106 

95 

102 

97 

105 

106 

107 

107 

113 

113 

110 

Sweden 

138 

99 

100 

99 

100 

102 

103 

103 

105 

106 

105 


( [Source : Economic Commission for Europe] 
Link these index numbers together to give for each country a quarterly 
series showing changes in production (average 1938 = 100) and indicate why 
this process is only approximate Represent the resulting series graphically 
f n d comment on the relative post-war movements m industrial production 


1 m the four countries. 


(L.U. B.Sc, Econ., 1960) 




CHAPTER IV 


ESTIMATES AND LIMITS OF ERROR 

Statistical data are the result of measurement and enumeration. 
No measurement can ever be exact. It can be stated as correct 
to so many significant figures, or to so many places of decimals, but 
the “ real ” value can never be known. 

Where the conditions admit of accurate counting, enumerations 
may be exact, but in the great majority of cases, they are estimates, 
subject to certain “ errors ”, which themselves are the result of 
estimation. 

It is important that the extent of the possible limits of error in 
calculations with such “ inaccurate numbers ” should be appreciated. 

Definitions 

Let X be the true value of any quantity and let x be its estimated 
value. 

Let | X — x | represent the positive numerical difference between 
X and x. 

Then the absolute error in the estimate ~ | X — % | and the 

| x_ % I 

relative error in the estimate = ' - 

The percentage error in the estimate = —-~ X 100. 

x , 

Example I. An estimated length is 2-56 ins., correct to three signi¬ 
ficant figures. 

There is a possible error of 0-005 in. in the estimate. 

The maximum absolute error - 0-005 m. 

,, relative ,, = ttttt — 0-002 

2-56 

,, percentage ,, = 0-2%. 

Example II. An estimate of 352 is stated to have a possible error of 
± 3%. What are the absolute and relative errors ? 

352 ± 3% = 352 ± 10*56. 

The maximum absolute error — 10*56—say 10*6—and maximum 
relative error =* 0*03. 


106 



ESTIMATES AND LIMITS OF ERROR 


107 


I. Maximum Absolute Errors 
Addition and Subtraction 

Let x v x 2 ... x n be any measurements, or estimates, and let 
c v e 2 . . . e n be the possible errors in x v x 2 . . . x n , respectively. 
Then the maximum value of the sum of the estimates is 

(Xi + X 2 + . . . + X n ) + {c !+ + * * * e n) 

i.e., Ex + Ee. 

Similarly the minimum value of the sum of the estimates is 
Ex - Ee. 

the possible limits of the sum = Ex ± Sr. 

Example III. Add together 500, 300 and 200, the possible error in 
each case being 1%. 

500 1 5 
300 ± 3 
200 ± 2 


1000 i 10 

i.e., the sum may lie between 1010 and 990. 

Let x x and x 2 be two estimates with possible errors of e 1 and e 2 
respectively. Then their difference, x 1 being greater than x 2 , is 
(h - * 2 ) ± ( e i + e 2 ). 

Example IV. Subtract 300 4- 1% from 500 J 1%. 

500 4- 5 
300 .L 3 


200 ± 8 

i c., the differences may lie between 208 and 192. 

The maximum error, then, in the sum of a number of estimates, 
or in the difference between two estimates, is the sum of the several 
errors in the estimates. 

Multiplication 

Find the product of x x 4: t x and x 2 ± e 2 . 

The maximum product 

= (*i + el)(x t + e 2 ) = x x x 2 + e x e 2 + x x e 2 + x t e s . 
The minimum product 

— (*i — ej)(x a - e 2 ) = x 1 x i + e x e 2 - x x e t — x# x . 



108 INTRODUCTION TO STATISTICAL CALCULATIONS 


Since e x and e 2 will presumably be relatively small their product, 
e x e 2 , may be neglected and the limits of the product 

(*i ± e i)( x 2 ± e 2 ) 

taken as x 1 x 2 i {x l e 2 + x 2 e i)- 

Example V. Multiply 13*5 by 3-75, given that each number is correct 
to three significant figures. 

(13-5 ± 0-05)(3-75 ± 0-005) =c= 50-625 ± (13-5 x 0-005 + 3-75 X 0-05) 

=2= 50-625 ± 0-255 

i.e., the product may lie between 50*88 and 50-37 or say between 50-9 and 
50*4. 

Division 

Find the quotient when x x -± e x is divided by x 2 ^ e 2 . 

The maximum value of the quotient = 

A 2 ~ £ o 

y _ g 

and the minimum value of the quotient — ' --— 

r e 2 

the limits of error of — are [ X - — —'l and (- 1 — — l ). 

x 2 \x 2 — e 2 xj \x 2 x 2 + c,J 

~ - T = (*i** + *a e i - Vi + x x e t )l(x t a - x 2 e 2 ) 
a 2 c 2 x 2 > 

*i g 2 4" X 2 € x 

X 2 2 

where e 2 is small. 

The other limit—i.e., — — — —has the same value. 

%2 -^2 I ^2 

the maximum absolute error in — ~ 

*2 V 

Example VI. Divide 1320, correct to 4 significant figures, by 25, 
correct to 2 significant figures. 

1320-5 

Maximum quotient = ~— =- 53-898 

24- 5 

• 1319-5 

Minimum „ — --— 51-745 

25- 5 

quotient = 52-82 i l' 98 
i.e., the limits are about 53-9 and 51-7. 

By the formula, 

quotient = 52*8 ± 1*08. 

Note that the answer can be stated as correct to 1 significant figure 
only, viz. 50. 



ESTIMATES AND LIMITS OF ERROR 109 

In division it is usually preferable to ignore the formula and to 
obtain maximum and minimum values of the quotient directly. 


Arithmetic Average 

Find the mean of the following: 

*1 ± e V X 2 ± ^2> ’ * * ± *n- 

The sum of these estimates =• Ex £ Zc 

. Ve 

.. their mean — — £ 

w n 


i.e., the maximum possible error in the mean of the estimates is ™ 


—i.e., the mean error. 


Example VII. Find the mean of the following estimates: 

120 correct to 5% 

130 „ 10% 

350 ,, 15% 

360 „ 21% 

Mean - £{(120 £ 6) f (130 £ 13) £ (350 i_ 52*5) \ (360 4: 9)} 
- £(960 £ 80*5) =2s 240 £ 20. 


II. Maximum Relative Errors 

Multiplication 

It has been shown that 


(*i ± e i)(x-z ± f s )-¥s ± (*i e z f -Vi) 


where e 1 and e 2 are small, and that the maximum absolute error in 
the product x x x 2 may be taken as (x x e 2 £ x 2 e l ) 


The relative error in x x is 


x 9 is 




, is 


x ^ 2 £ x 2 e x 


XfXn 


i.e.. 



i.e., the maximum relative error in the product x x x 2 is approximately 
equal to the sum of the maximum relative errors in x x and x 2 , 
provided always that these errors are small. 

It follows that the maximum percentage error in the product 
x x x 2 is approximately equal to the sum of the maximum percentage 
errors in x x and x 2 , provided that these errors are small. 



110 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example VIII. Find the limits of the product of two quantities x 
and y subject to possible errors of a% and b%, respectively, where a 
and b are small. 

<* ± *%)(>■ ± *%) = *y(i i iSo) t 1 ± m) 

( a b , ab \ 

~ xy v ± loo ± ioo + ioov' 

ab 

Since a and b are small is negligible. 

, ( a b\ 

product — xy ^1 ~ 0( j J 

i.e., — (a- t b)% 

which agrees with the result obtained above. 


Division 


x 

It has been shown that the maximum absolute error in — may 

be taken as 5- 1 -- 2 —- x ^ 1 . 

* 2 2 

the maximum relative error in — 

*2 


X i e 2 + X 2 e \ • *1 



X 

That is, the maximum relative error in — is approximately equal 

x 2 

to the sum of the maximum relative errors in x 1 and x 2 , and conse- 

x 

quently the maximum percentage error in — is approximately equal 

x 2 

to the sum of the maximum percentage errors in x 1 and x 2 , always 
provided that these errors are small. 

The foregoing argument applies where more than two quantities 
and one operation are involved. For example, assuming that all 
the errors are small, 


also 


{ x ± a °/o){y i b%)(z i o%) : 
(x d: &%) X (y i b %). 


= xy* ± {<* + b + c)% 
: ]T 2 ± ( a F^4 c + d)%. 


(x ± c%) X(P± d%) 

A maximum percentage error of a% in measuring the side of a 
square involves, approximately, a maximum percentage error of 



ESTIMATES AND LIMITS OF ERROR HI 

2 a% in the axea of the square, and of 3 a% in the volume of the 
corresponding cube. 

Useful Approximations 

If a, b, c are small errors, then such products as a 8 , ac may for the 
purpose of many calculations be regarded as negligible. 

(1 + «)(1 ± b) ^ 1 4- a ± b 
(1 -f- a)< 1 j~ b)(l +c) — l-f-a + 6 + c 
(1 ± a) 2 =£: 1 ± 2a 
(1 ±«) 3 -l±3a 

\~~a ” (1 “ “> J = 1 + « 

It can be verified by ordinary division that 

t ~— =-- 1 + a + a 2 . . ., 

1 “ a 

where, a being small, the 3rd and subsequent terms may often be 
neglected. 

Example IX. Find the possible limits of the answer to the following^ 
calculation: 

(120 ± 5) (340 1 10) 

250 ± 7 

This is equivalent to 

(120 L 4T7%)(340 ± 2-94%)/(250 ± 2-8%) 

, , 120 x 340 1AO/ 

limits =2r ——- _£ 10% very nearly, 

2o0 

i.e., 163-2 L 16-3—;say 180 and 147. 

Readers who are not keen on symbols should note that although 
the foregoing formulae are useful, maximum and minimum results 
can be obtained by straightforward multiplication and division. 
Then the mean of these results plus or minus half their difference 
gives the answer 

xy i maximum absolute error 
x 

or - 4- maximum absolute error, 

y 

as the case may be. Most of the problems relating to “ errors ” 
involve only these simple, if rather laborious, calculations. 



112 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example X. 

POPULATION CENSUS, 1931 
ENGLAND AND WALKS 

Proprietors and Managers of Retail Businesses 
Type Number % of total 

L Grocery and provisions . 92,750 10*3 

2. Textile and other clothing . 73,973 13-0 

3. Meat 49,431 8-7 

Assuming that the " numbers ” are correct and the percentages 
correct to one place of decimals, find between what limits the total 
number of proprietors may, according to the above data, he. 

_ , ^ x x 92,750 X 100 , 92,750 x 100 

From 1. lotal may range between-- and — — — - — 

° 16-25 16*35 

—i.e., between 570,769 and 567,278. 

From 2. Total may range between 571,220 and 566,843. 

From 3. Total may range between 571,457 and 564,914. 

total must be between 571,457 and 564,914—say between 
571,000 and 565,000. 

The actual census figure for the total was 569,127. 

Example XI. The Bankers’ Clearing-house Return for June 1946 
includes the following data for the nine Clearing Banks. 

Total deposits . . . . £4797 mn. 

Ratio of cash to deposits . . . 10-4% 

Ratio of advances to deposits . 17-5% 

Estimate (a) cash and (6) advances, stating limits of error. 

(a) Cash - (47-97 _b 0-005)(10-4 ! 0-05) 

=£= 498-89 t 2-45 

say £(499 L 2-5) mil. 

(b) Advances (47-97 ) 0-005)(17-5 0-05) 

=£= £(839-5 1; 2-5) mn. 

say between £842 mn. and £837 mn. 

The actual figure in the return for “ advances ” is £841 mn. 

Rounding Off Numbers 

It is frequently desirable to round off a series of numbers to the 
“ nearest thousand ", or to the “ nearest first place of decimals ", 
for example. 

In this process it is highly desirable that a uniform procedure be 
adopted. 

A method commonly employed is illustrated in the following 
example: 



ESTIMATES AND LIMITS OF ERROR 


113 


Round off the numbers 156, 188, 234, 345, 455 to the nearest 10. 

The first three numbers present no difficulty; they obviously 
become 160, 190, 230. 

In the case of the last two numbers it would be equally legitimate 
to call them 340 or 350, and 450 or 460. It is customary to round 
off such cases to the nearest even number, so that 345 becomes 
340 and 355 becomes 360. 

So far only the maximum possible errors in calculations made 
with “ inaccurate numbers ” have been considered. 

In practice, however, it is unlikely that maximum errors will 
occur. The error in the sum of n items, for example, will be a 
maximum only if the error in each item is a maximum and if all the 
errors are of the same sign; when n is fairly large the probability 
of a maximum error in the sum is negligible. 

A maximum error in the product of n numbers likewise can occur 
only if the same conditions are fulfilled; while a maximum error in a 
quotient requires a maximum error in each of the numbers and a 
difference between the signs of the two errors. 

Biased and Unbiased Errors 

When all the errors in a set of numbers are in the same direction 
—i.e., all in excess or all in defect of the " true ” value—they are 
said to be biased. If, on the other hand, positive and negative 
errors are equally likely, the errors are said to be unbiased. 

Example XII. The following table gives the expenditure, on the 
same item, of ten towns, written down m three different ways. 

Expenditure 


(1). 

(»). 

(mi 

(iv). 



Rounded off to the 

With the last three 



nearest thousand £ 

figures omitted 

Town 

As recorded (£) 

Cf'000) 

(£000). 

A 

25,326 

25 

25 

B 

18,942 

19 

18 

C 

57,564 

58 

57 

D 

43,201 

43 

43 

E 

39,750 

40 

39 

F 


33 

33 

G 

17,837 

18 

17 

H 

24,625 

25 

24 

1 

85,135 

85 

85 

j 

49,617 

50 

49 


395,303 

396 

390 



114 INTRODUCTION TO STATISTICAL CALCULATIONS 


The total of col. (iii) agrees closely with that of col. (ii), the relative 
error being less than 0*2 % ; the relative error in the total of col. (iv) is 
about 1*3%. 

It can easily be seen that with rounded numbers the greater the 
number of items included in a sum the more likely it is that the 
errors in the items will tend to cancel out. Such errors are some¬ 
times said to be “ compensating 

In the case of col. (iv) in Example XII the errors in the items 
must necessarily be negative. They are, therefore, biased errors, 
and their effect is “ cumulative The more items there are in 
the sum the greater will the absolute error in the total become. The 
practice of dropping digits in this fashion should never be adopted. 

It is sometimes stated, without qualification, that unbiased errors 
are compensating and biased errors cumulative. This, however, is 
not necessarily true. 

We have seen that it is true in addition, but in subtraction, 
if the errors in both numbers are of the same sign, the difference 
between the numbers will be nearer the true value than if one error 
is positive and the other negative. 

If two numbers are multiplied together, then errors of opposite 
signs will yield a product nearer to the true value than if both errors 
are of the same sign; that is, unbiased errors tend to be com¬ 
pensating in multiplication. But in division if the errors in the 
two numbers are of opposite sign the quotient will be further from 
the true value than if both were of the same sign. Thus in division 
the effect of biased errors may be to make the error in the answer 
less than the errors in the estimates. 


Significant Figures and Approximations 

Example XIII. The length and breadth of a rectangle are given as 
17*32 ft, and 16*45 ft, correct in each case to four significant figures. 
How accurately can the area of the rectangle be stated ? 

The number of significant figures in the answer cannot exceed, and 
must generally be less than, the number in the data. To state the 
answer as 17*32 x 16*45 = 284*9140 sq. ft. is wrong, since with the 
given data the area may be anything betdflben 285*083 and 284*745 
sq. ft. The area may be given as (284*9 | 0*2) sq. ft., but it cannot 
be stated correct to more than three significant figures, viz. 285 sq. ft. 

Generally speaking, if two numbers, each with x significant 
figures, are multiplied together, not more than (x — 1) significant 



ESTIMATES AND LIMITS OF ERROR 115 

figures in the product can be relied upon, and cases will occur when 
not more than (x — 2) such figures are reliable. The same rule 
holds good also in the case of division. 

Although care should always be taken not to state a result to more 
significant figures than the accuracy of the data warrants, it is 
important to avoid the introduction of arithmetical errors in the 
result by discarding figures too freely or too soon. Jt is better to 
retain too many figures than too lew. In the first case the 
unreliable digits can be dropped when stating the results; in the 
second case the answer is probably wrong, or at best not stated as 
accurately as the data actually permit. 


Exercise 4 

1 . Write down n large numbers at random; “ round the numbers off ” to 
the nearest 1000 and obtain their sum. What is the maximum possible error 
m such a sum ? What is the maximum possible enor m the average of n 
such rounded numbers > 

2 Multiply 18*5 by 172-5, given that all the figures are significant. State 
your answer m the form p ± e. To how many significant figures may the 
product be stated ? 

3 The length and breadth of a rectangular field are recorded by a surveyor 
as 1320 ft and 1264 ft., respectively. If the chainage may be in error by 
4 - \°/ 0 , what are the maximum and minimum values of the area ? 

4 A rectangular field is L yds. long and W yds. wide The relative errors 
m L and W are known to be less than # and y respectively. What are the 
possible absolute and relative errors m the area of the field stated as area « 
L X Wsq yds "> 

5. The dimensions of a room, stated correct to the nearest foot, are 20 ft., 
16 ft and 10 ft. Find the maximum and minimum values ot (a) the perimeter 
of the room, (b) the area of the floor, (r) the area of the four walls and (d) the 
volume of the room Give also in each case the maximum possible percentage 
error involved m calculating with the dimensions as given. 

6 The estimated cost of making a rough light railway is $10,000 per 
mile, and the estimated length of lino •-■required is 100 miles Find the 
cost of making the line if the estimates may be m error by 10% and 5%, 
respectively. 

7 S — P -f Q Sis estimated at 156,000, correct to the nearest 1000, 
and P at 49,500, correct to the nearest 100 Find the value of S/Q, with the 
possible limits of error 

{Inst. T. f 1941) 

8 . At a certain date the population of a country was 11,897,000. At the 
same date the number of persons per private car in the country was stated 
32*6, and th^ number of private cars per 1000 persons as 30*7. From these 4 
data make an estimate of the number of private cars in the country. Neglect 
error in population. 



116 INTRODUCTION TO STATISTICAL CALCULATIONS 
9 Expenditure and Taxation 1939-44 (£mn.) 


Year 

Expenditure 

From taxation. 

% from taxation. 

1939 

1054 

896 


1940 

1810 


56*0 

1941 

3807 

— 

35 1 

1942 

— 

1962 

41*1 

1943 

5788 

2948 

__ 

1944 

5937 

— 

50*1 


Supply appropriate figures for the blank spaces in the table 

10. The operation A — ^ is performed on the slide rule If the errors m 

B and C are estimated at _L 1% and ±2% respectively, and the error m 
the slide rule at J- 2 5%, within what limits may A lie ? 

(Soc 1 A A ) 

11. The three dimensions of a rectangular block of stone are x lf x 2 and x 3 
*t. respectively, with a relative error of 0*01 in each case What is the greatest 
possible percentage error m the volume stated as x 1 x 2 x 9 cu ft, and what is 
he maximum absolute error in the volume ? 

12. The population of the United Kingdom was 31*5 mn in 1871 and 
17*7 mn. m 1891. Find the greatest and least possible values of the increase 
>er cent, in the value of imports per head, at the prices of 1902, in 1891 as 
compared with 1871, consistent with the data, which are all stated correct 
o three significant figures 

Value of imports in 1871 at 1902 prices . ^165 mn 

Value of imports in 1891 at 1902 prices . .£339 mn 

(L U B Sc Econ ) 

13 The total number of occupied males in Great Britain in 1921 and in 
931 are given as 13,612,000 and 14,790,000 respectively 

Percentage Age Distribution of Occupied Males 



Age range 

1921 

1931 

(a) 

14 and under 16 

3*9 

3*0 

(b) 

16 „ 

„ 18 

5*3 

4*7 

(c) 

18 „ 

„ 25 

17*2 

17*5 

(g) 

55 „ 

„ 65 

10 5 

122 


% [Annual A bstract of Statistics] 

From the above data calculate the numbers in the various age groups (i) in 
921 and (u) in 1931, stating in each case the limits of error due to the 
ercentages being given to the nearest -M %. 

What is the percentage decrease in numbers for age ranges (a) and (b) m 
931 as compared with 1921 ? 

14. Return of Nine Clearing Banks, Decf 1944 

Total Advances . . . . . ^717 mn. 

al Ratio of Advances to Deposits . . 16*6% 

% Ratio of Cash to Deposits . . . 11*0% 



ESTIMATES AND LIMITS OF ERROR 


117 


Estimate: [a) the amount of deposits, 

(b) the amount of cash. 

The data are stated correct to three significant figures 
15. U K. Coal Production 


1911 

1925 


Output 
(mn tons). 


272 

243 


Number of persons 
employed (’000) 
1045 
1103 


Calculate to the nearest significant figure the output of coal per person 
employed in 1911 and in 1925, and the percentage change in output per 
person employed between these dates 

(L U B.Com., 1932) 


10 . 

Births 
Deaths . 
Marriages 


England and Wales, 1930 

Number Per 1000 of population. 
605,059 14-8 

495,798 12 1 

353,361 17-3 * 


* Number of persons married, not number of marriages, is the numerator 
for this entry. 


Find the maximum and minimum population consistent with the figures 
m this table, remembering that the “ per 1000 ” entries are computed to the 
nearest unit m the first decimal place 

(L U. Inter. Com.) 


17. 


District 

A 

B 

C 


Yield in bushels 
Acreage per acre 

435,000 31 2 

273,000 25-7 

527,000 43-4 


Make an estimate of the total yield : ( a) for each separate district and 
(b) for the three districts combined, bearing in mind that the acreage is given 
to the nearest 1000 acres, and the yield to the nearest one-tenth bushel. 

18. The population of the United Kingdom as recorded in the census of 
1931 was 46,038,000 Between 1911 and 1921 there was an increase of 4*6%, 
and between 1921 and 1931 an increase of 4-6%. 

Make an estimate of the population in 1M4, stating limits of error. 

19 In 1911 the number of structurally separate dwellings ip England and 
Wales was 7,691,000 From 1911 to 1921 the number rose by 3*7% and 
from 1921 to 1931 by 17-8% In 1931 the number of structurally separate 
dwellings was 9,400,000. The number of dwellings being stated correct to 
the nearest thousand and percentages correct to one place of decimals, find 
between what limits the number of dwellings m 1921 may, according to the 
data, have lain. 

20 The total number of insured workpeople in Great Britain and Northern^ 
Ireland is estimated at July in each year. The actual number unemployed is^ 
known each month, and the ratio of this figure to the estimated number of . 
insured at the previous July is published as the current percentage unemployed 



118 INTRODUCTION TO STATISTICAL CALCULATIONS 


The percentage unemployed at July 1925 was originally shown as 1L5, 
but later, when the estimate was made of insured workpeople at July 1925, 
this was modified to 11-2. 

What are the possible limits we may take as the percentage increase in 
the estimated number of insured workpeople from July 1924 to July 1925, 
the calculation being based on the two figures given above ? 

(Tnst T) 

Assume that the percentages are stated correct to one place of decimals 

21. If the recorded births in a certain district may be m defect by x% and 
the estimated population in error f y%, find an appropriate expression for 
the greatest possible error in the birth rate, and y being assumed fairly 
small, say not more than 5%. 


Birth rate ~ 


no of Imths x 1000 
population 


(LU. BSc Econ) 


22. Employment in Gieat Britain 

June, 1939. 

Total employed m Industry 17,920,000 — 100% 
Employed in : 

Agriculture . . . 5-1% 

Transport and Shipping . 7*1 

Mining . . 4-9 


April, 1940. 
16,622,000 - 100% 

61 °/ 

D 1 / 0 

8-4 

4-8 


[Adapted from Ministry of Labour Gazette ] 


(1) Estimate the numbers m the different industries at both dates 
(n) Estimate the possible percentage changes m numbers employed m the 
different industries m April 1946 as compared with June 1939 

23. If the percentage of unmarried males over 15 years of age marrying in 
a certain year was 72-5 and the corresponding percentage for unman led 
females over 15 years of age was 61 5, find what percentage of the unmarried 
population over 15 years of age were males : 

(a) Assuming the figures as stated to be correct 

(fe) Taking into consideration that the percentages are stated correct 
to one place of decimals. 


24 London, 1926 

Population (estimated) 
Registered in year. Births. 

(a) Number . . . 78,825 

(b) Per 1000 of population 17T 


. 4,615,400 

Deaths Marriages 

53,476 39,734 

11*6 17-2 * 

1000 


* This is the number of persons married x ——, i c., (39,734 x 2) 
, 1000 popu a lon 
' TT- 

Using data [a) and data \b) find within what limits the population may lie. 

(L.U. Inter. Com.) 



ESTIMATES AND LIMITS OF ERROR 119 


25. Find the value of the expression if the numbers, taken in 

order, are subject to possible errors of 4, 3, 2 and 1% respectively. 

Use formula and check by direct calculation 

26 Show that the maximum and minimum values of the expression 


(A dr a %)(B ± b%) 
(C ± c%)(D ± d%) 


are approximately 


AB/ 

CiA 


1 + 


a -j- b -f* c -f- d 

Too , 


and 


A? — a ^ respectively, if a, b, c and d are small. 


27 The following data relate to British grain production m 1939 : 



Acreage 

Output per acre 

Crop. 

('000 acres). 

(cwts.). 

Wheat . 

1766 

19 

Barley 

1013 ‘ 


Oats 

2427 

16 


Total . 5206 17 

[Annual Abstract of Statistics, Source * Agricultural Departments] 

Acreage is given to the nearest thousand acres and output to the nearest 
cwt. Calculate the total output of wheat, of oats, and of all three grains, 
indicating the maximum error in each figure. By difference, estimate the 
output of barley and the output per acre What is the maximum error in 
these estimates ? Comment on this procedure. 

(L U B Sc Econ., 1946) 

28. In 1937 the expenses of a company were x% of the gross profit. In 
1938 the gross profit was less by y% than in 1937, and expenses were z% of 
the gross profit of 1938. Find the percentage reduction in the net profit of 
1938 as compared with that of 1937 («) accurately, ( b ) approximately, if x, y 
and £ are small. 


1 


Additional Exercises on Chapters I-IV 


Percentage Age Distribution of the Estimated Total Population 
of the United Kingdom, 1947. 


Age 
0- 4 
5-14 
15-24 
25-34 
35-44 
45-54 
55-64 


% of total 
8-2 

12- 9 
141 
15-3 
15-7 

13- 0 
10-3 


65-69 

70 and over 


4*1 

6*4 


100*0 

[Derived from Annual Abstract of Statistics, 1937-47] 


4 . 



120 INTRODUCTION TO STATISTICAL CALCULATIONS 

Draw an ogive and from it determine 

( I ) the median and quartile ages , 

( II ) the percentage of the population in the age ranges 10-19, 20 29, 
30-39, etc 

Check the answers to part ( 1 ) by calculation 

2 Represent the following group in the form of an ogive Read off the 
median and quartile wages How many of the group of 1000 wage-earners 
were in receipt of (a) 42? 6 d and less than 475 6 d , (b) 47s 6d and less than 
52s 6d per week ? 

Weekly wage (shillings) 30- 35- 40- 45- 50- 55- 60- 65 and less than 70 

No of wage earners 9 108 488 230 112 30 16 7 2/= 1000 

3 Calculate the mean weekly wage and the quintile weekly wages of the 
above distribution—Question 2—and in the cases of the quintiles check 
your results by interpolation on your ogi\ e 

4 Calculate the mean and median lengths of the following distribution, 
given that the measurements were recorded to the nearest one-eighth of an 
inch 

x (inches) 10 11- 12- 13- 14- 15- lb 17 18- 19 20 

/(frequency): 2 7 12 22 30 24 17 11 9 6 £/ =■ 140 


5 Administrative County oj the West Riding of \oikshire 

Agricultural Wages for Males, March 1949 




Mm rate for 

Overtime rate 




47 hour week 

per hour— 

No of woikers 

Age of worker 

(shillings) 

weekdays 

in a group 




s d 

15 and under 16 

44 

1 2 

9 

16 „ „ 

17 

50 

1 4 

15 

17 ,, „ 

18 

* 60 

1 7 

18 

18 „ 

19 

68 

1 10 

15 

19 ,, „ 

20 

75 

2 0 

17 

20 „ „ 

21 

85 

2 2 

23 

21 and over 


94 

2 6 

32 





129 


(i) Calculate the average weekly wage of the group of 129 male workers at 
the mmimum rate 

(n) Calculate the average weekly wage of the group if each member of the 
group puts m 2 hours' overtime on each day, Monday to triday, inclusive 
6 A man walked from X to Y, doing 3 mph for the first half of the 
distance and 4} mph for the second half He cycled back to Y, doing 
9 m p h for the first half of the distance and 13± m p h for the second'half 
Calculate his average speed inmph 

(1) from X to Y, 

(2) from Y to X and 

(3) for the complete journey. 




ADDITIONAL EXERCISES ON CHAPTERS I-IV 


121 


7 


Lrewhon U D C Census 19— 


No of 
persons m 
family 

No 

of private families occupying the follow¬ 
ing no of rooms 

Total 
no of 
private 
families 

1 

2 

3 

4 

5 

b 

7 

(a) 

(/') 

(c) 

(d) 

(0 

(/) 

(g) 

h) 

w 

1 

4 

3 

3 




— 

10 

2 

3 

10 

12 

4 

— 


__ 

29 

3 

i 

20 

30 

10 

2 

_ 


63 

4 


3 

12 

18 

3 

1 

— 

37 

5 


1 

3 

8 

25 

10 

— 

47 

6 

— 


1 

4 

8 

25 

2 

40 

7 

- 

- 

— 

1 

2 

10 

2 

15 

8 


— 


— 

1 

Q 

— 

3 

1 ( > 

_ 

— 


1 

_ 

2 

2 

5 

10 

- 


— 

1 

o 

1 

— 

4 

Total no 









of private 





1 




families 

8 

37 

61 

yj 

43 

51 

M 

253 

Total rooms 
occupied 

- 

— 

— 

— 


1 

' 

- 

1 Total no of 
j persons 




Complete the table and compute cols (j) to (p) as in Exercise 1, No 36 
8 In a group of 500 wage-earners the weekly wages of 4% were under 605 
and those of 15% were under 629 6d 15% of the workers earned 95s and 
over and 5% of them got 100s and over 
The median and quartile wages were 82s 3 d, 72s 9 d and 90s M , the 
fourth and sixth decile wages were 78s M and 85s 4 d respectively 

Put the above information in the form of a frequency distribution and 
estimate the mean wage of the 500 wage-earners therefrom 


9 Age DisUihuhon of 1000 Wagr-cainers 

Age (years) 14-15 16-17 18-20 21-24 25 29 30-34 35-39 

Frequency: 72 87 78 107 114 122 96 

Age (years) 40-44 45-49 50-54 55-59 60-64 

Frequency 78 70 67 56 53 £/ = 1000 

Compute (a) the mean age, 

(b) the median and quartile ages 







122 INTRODUCTION TO STATISTICAL CALCULATIONS 


10. A man gets three successive annual rises in salary of 20%, 30% and 
25% respectively, each percentage being reckoned on his salary at the end of 
the previous year. How much better or worse off would he have been 
if he had been given three annual rises of 25% each, reckoned in the same 
way ? 

11. The numbers 8, 25, 17 and 30 have weights 3, 2, x and 1 respectively 
If their weighted A.M. is 17-2, find the value of * 

12. The four numbers 8, 25, 17 and 30 have a weighted G.M of 15*3 If 
the weights of the first three numbers are 5, 3 and 4 respectively, find the 
weight of the fourth number 

13. The G.M of a numbers is and that of b other numbers is y. What 
is the geometric mean of the (a -f- b) numbers ? 

14. The numbers 3-2, 5*8, 7*9 and 4 5 have frequencies x, (x -f 2), (x — 3) 
and (x -f 6) respectively If their weighted A M is 4-876 find the value 
of x. 

15. A motor car, when starting from rest, travels the first twentieth of a 
mile at 6 mph and the next three twentieths at respectively 8, 12 and 
24 m.p h , but its average speed over the first fifth of a mile is not 12*5 m p h 
but 9*6. Explain the apparant paradox 

(R S S. Certificate, 1949) 

16. From the following table estimate the total turnover and the average 
value of a single order and the value of the median order Comment on the 
difference between these latter two statistics 


Analysis of a Firm's Order Book 




10 

20 

30 

40 

50 

60 

70 


Value of 

Under 

and 

and 

and 

and 

and 

and 

and 

Total 

order (£). 

10 

under under under under 

under under under 



20 

30 

40 

50 

60. 

70 

80. 


Number of 
buyers . 

28 

356 

287 

154 

67 

35 

16 

7 

750 


{Inst T, 1949) 


17. Calculate the geometric and harmonic means of the following distribu¬ 
tion : 

4*5- 5 5- 6*5- 7*5- 8*5- 9 5- 10-5-11-5 

/; 2 4 7 10 9 6 2 2/« 40 

18. Give the average return and identify the form of average in each of 
the following cases: 

(1) A man invests equal sums of money at x°/ 0t y% andz% respectively. 

(2) He invests £a, £la and £4a respectively at each of the above rates 

(3) He invests in concerns paying x% , y% and z% respectively such 
sums that his income from each investment is the same. 



ADDITIONAL exercises on CHAPTERS l-IV 


m 


19 Grocers, Provision Met chants and General Food Shops 

(Analysis by Size of Shops at September, 1947) 


sugar registrations 

All shops 

1- 49 

26,525 

50- 99 

29,420 

100- 199 

30,610 

200- 299 

15,349 

300- 499 

17,212 

500- 999 

17,014 

1000-1999 

7,556 

2000 and over 

2,362 


146,048 


[Annual Abstract of Statistics, 1937-47 , Sotirce : Ministry of Food] 

(]) Assuming an upper limit of 2999 sugar registrations, calculate the 
mean number and the median number Use (a) the frequencies rounded 
oil to the neaiest thousand and (b) the frequencies expressed as per¬ 
centages, correct to one place of decimals, and compare the results. 

(a) It is required to construct a histogram to illustrate the above table. 
If the horizontal scale is to be 1 in = 500 S R, and the area scale 
1 sq m sas 20 % of shops, show that the heights of the columns will be : 

9-29", 10 05", 5-25", 2*625", 1*475", 0*58", 0*13", 0*04" 

20. Butchets—Great Britain 

Analysis by Size of Shop at Novembet, 1947 
(All shops) 

Weekly retail turnovei Number of shops 


* Under /I 

5 

2,265 

£15 and under £30 

5,817 

30 „ „ 

45 

8,062 

45 „ „ 

60 

7,412 

00 „ „ 

75 

5,949 

75 „ „ 

90 

4,231 

90 „ „ 

lf)5 

3,021 

105 „ „ 

i 2 o 

2,183 

120 „ „ 

135 

1,501 

135 „ „ 

150 

1,093 

150 „ „ 

165 

806 

165 „ „ 

180 

610 

180 „ „ 

195 

420 

195 „ „ 

210 

3 x 6 

210 „ „ 

225 

267 

* £225 and ovei 

1,232 

45,185 


[Annual Abstract of Statistics, 1937-47; Source: Ministry of Food] 
* Take range as £1O~£250. 







124 INTRODUCTION TO STATISTICAL CALCULATIONS 


Show that the mean and median weekly turnovers are about £11 and £58 
respectively. Use the frequencies rounded to the nearest ten as weights. 


21. * Marks of 310 Candidates 


\Units. 










-i 

i 


:ns 

0 

1 

2 

3. 

4 

5. 

6 

7 

8 

9 


0 

i 

' 



__ 

_ 

1 


_ 

_ 

1 

10 

— 

— 

— 

— 

— 

1 

1 

1 

— 

1 

4 

20 

1 

2 

1 

3 

1 

4 

1 

1 

1 

4 

19 

30 

10 

8 

8 

9 

16 

13 

8 

3 

11 

4 

90 

40 

! 9 

5 

o 

8 

4 

10 

6 

2 

8 

10 

64 

50 

17 

9 

3 

3 

4 

7 

4 

8 

6 

— 

61 

60 

11 

10 

2 

5 

3 

8 

5 

1 

3 

8 

56 

70 

5 

2 

1 

— 

2 

3 

1 

- 

— 

— 

14 

80 

1 

__ 


— 

- 

- 



— 


1 


54 

36 

17 

28 

30 

46 

27 

16 

29 

27 

310 


The table gives the distribution of the marks of 310 candidates m an 
examination. 

Thus there is one candidate with 6 marks, three with 23, and so on. 
Calculate the average mark for the whole group. 

(RSS Certificate, 1948) 

22. Estimate the value of the mean from the following table 

Range of marks: 0- 10- 20- 30- 40- 50 - 60- 70- 80 - Total 

(/): 1 4 19 90 64 61 56 14 1 310 

To what do you attribute the difference between the results "> 

(R.S S. Certificate, 1948) 

23. Determine the median mark in the above distribution : 

( 1 ) direct from the table of Question 21; 

(n) by interpolation in the table of Question 22 
24 Find the arithmetic mean and the median of the distribution of the 
heights of schoolboys in the table below. The heights in the original data 
were recorded to the nearest one-eighth of inch. 


Height m 
inches. 

Frequencies. 

Height m 
inches. 

Frequena 

30- 

8 

48- 

2,196 

33- 

72 

51- 

1,913 

36- 

350 

54- 

1,115 

39- 

1,193 

57- 

361 

42- 

1,914 

60- 

69 

45- 

2,178 

63- 

13 


11,382 

(R S.S. Certificate, 1947) 

25. Find the quartile heights of the 11,382 schoolboys. Calculate your 
answers and check on ogive. 




ADDITIONAL EXERCISES 

ON CHAPTERS 

I-IV 

Boot and Shoe Industry : Size of Manufacturing 


Establishments m 1935 


:e of establishment 



Net output 

(average no. 

No. of 

No. of persons 

per person 

employed). 

establishments 

employed. 

employed 

(•) 

(u) 

(in). 

(iv) 

11- 24 

160 

2,480 

^197 

25- 49 

145 

5,640 

147 

50- 99 

193 

13,760 

155 

100-199 

147 

20,790 

155 

200 -299 

64 

15,320 

153 

300-399 

35 

11,760 

169 

400-499 

19 

8,440 

171 

500-749 

22 

13,030 

167 

750-999 

9 

7,670 

175 

1000 and over 

14 

17,670 

162 


80S 

116,560 




[Census of Production, 193! 


125 


Taking the upper limit as 1499 

(a) From cols (i) and (n) calculate the mean and median size ol 
establishments 

(b) From cols ( 11 ) and (m) find the average numbers employed in 
establishments of each size and of all sizes and discuss the relationship 
between size of firm and efficiency as measured by output per person 
employed 

(c) You are required to draw a histogram to illustrate cols. ( 1 ) and (n). 
If the area of the diagram is to be 8-08 sq in and the horizontal scale 
1 in. = 200 employees, calculate the height of the column over each 
interval of the table. 

27 A dealer m spirits says m respect of five of his customers that a bottle 
of whisky lasts 8, IQ, 20, 5 and 10 days respectively. What is the average 
' life ” of a bottle ? 

28. 

of days of " life ” of a bottle (%wlnsky ; 8 10 25 5 10 

of customers * 10 8 4 10 12 Zf ~ 44 

(i) Calculate the average life of a bottle for the above group. 

(u) Calculate the total consumption of whisky per annum by the 44 
customeis. 

29. The population of a country at the census of June 1900 was 14-75 
niihons , at the census of June 1910 it was 10-45 millions. Assuming a 
r compound interest " law rate of growth estimate the population at jifrie 
|1905. V 

| 30. Two men—X and Y—started working for the ABC. Co. at SirmMr 
|jobs on Jan 1, 1930 X asked for an initial salary of £300 with an annual 

F %{ se of £30, Y asked for an initial salary of £200 with a rise of £15 every six. 
onths. (i) What were their respective salaries on Dec. 31, 1939, assuming 




128 INTRODUCTION TO STATISTICAL CALCULATIONS 

that these arrangements remained unaltered, and (n) what total amount had 
each of them been paid in salary during their 10 years with the firm ? 

Note.~~~The wth term of the series a, a + d, a p 2d, etc., is a -f [n —l)d 

and the sum of n terms of the series is^ [a -f l) t where l is the value of the 
wth term. 


31. 


Coal Pioduction m Great Britain 
(Weekly Averages) 


1947. 

Production of 
saleable mined coal 
(’000 tons). 

Wage-earners on 
colliery books 
(’000). 

Average no. of 
shifts worked per 
wage-earner per 
week. 

Jan. 

3587 

694 

4-87 

Feb. 

3630 

097 

4*94 

Mar. 

3717 * 

703 

5-03 

Apr. 

3494 

710 

4-79 

May 

376(5 

714 

4 86 

June 

3495 * 

717 

4-52 


* Average of 5 weeks. 

[Monthly Digest of Statistics, Source . Ministry of Fuel and Power] 

Estimate ( 1 ) the number of man-shifts worked and (n) the average output 
per man-shift in each “ month ” 


32. 


Estimated Age Distribution of the Average Number of 
Wage-earners on Colheiy Books 

No of wage-earners (thousands) 



Age group 

1931. 

1937 

1945. 

14 and 15 

31 

28 

12 

16 „ 17 

51 

43 

22 

18 „ 19 

51 

37 

50 

20-25 

129 

107 

99 

26-30 

112 

98 

52 

31-35 

99 

101 

76 

36-40 

84 

„ 90 

89 

41-45 

75 

73 

86 

46 -50 

69 

64 

05 

51-55 

59 

54 

57 

56-60 

48 

43 

48 

61-64 

24 

24 

29 

65 and over 

18 

16 

24 

Totals : 

850 

778 

709 


[Ministry of Fuel and Power, Statistical Digest, 1945] 

at changes have taken place m the age composition of miners? 
ipute any averages and ratios which you think would help to a bettei 
lerstandmg of this table. || 

W J. B Sc. Econ , 1947) 



ADDITIONAL EXERCISES ON CHAPTERS I~IV 127 


33. U.K. Population and Housing in Selected Towns, October, 1942 





% of occupied 

% of occupied 




dwellings with 

dwellings with 



Persons per 

more than one 

more than two 



dwelling 

person per 

persons per 

Town 

Population 

unit 

room. 

room. 

Reading . 

. 116,193 

4-15 

251 

13*9 

Slough . 

63,712 

3*86 

12*4 

41 

Cheltenham 

60,488 

3-82 

23*4 

1*7 

Swindon 

67,862 

3-86 

22-2 

1*8 

Blackpool 

. 162,128 

3*61 

34*0 

1*3 

Southport 

88,708 

3*68 

19*7 

* 2-5 


Compute for the above six towns the average percentage of occupied 
dwellings with (a) more than one person per room, (b) more than two persons 
per room. 

Give an account of how the population figures shown above are estimated, 
and give the definitions of other statistical terms in the table. 

(LU B Com , 1946) 

[From The Impact of the War on Civilian Consumption, H M S.O. 

Source . Ministry of Health] 

34 Distribution of Households Accoiding to Numbeis of Lanins and 
Non-earners, Stepney, 1946 


No of non- 

eainers 




No 

of earners 




Total 

0 

1 

2 

3 

4 

5 

6 

7 

8 

0 


17 

34 

8 

3 

o 

1 

_ 

_ 

Go 

1 

25 

46 

36 

30 

5 

1 

1 

2 


146 

2 

8 

38 

30 

8 

3 

— 

. _ 

— 

- 

87 

3 

4 

1 

23 

15 

7 

2 

3 

5 

6 

1 

1 

- 

•— 

1 

42 

23 

5 

6 

1 ~ 

o 

1 

1 

— 

1 

— 

- 

~ 


4 

1 

7 

i: 

o 

1 

— 

— 

- 


- 

- 

3 


34 

143 

112 

54 

19 

4 

o 

o 

1 

371 


[Source : Unpublished material from a sample sun ey of Stepney] 


From the total row and column, obtain the arithmetic mean numbej^of 
earners and the arithmetic mean number of non-earners. From the bf&y of 
the table, calculate the proportions of those families with earners which havd : 
(a) less than one non-earner per earner, and (b) two or more non-earners pel* 
earner. & * * 

* f (L U. B.Sc. Econ , 1947) " 




128 INTRODUCTION TO STATISTICAL CALCULATIONS 


36. Distribution of Pits According to Size and Output 

Output of Coal per Manshift Worked 


Pits 

employing : 

Under 
15 cwt. 

15-19 
c wt 

20-24 

cwt 

25-29 

cwt 

30-34 

cwt 

35-39 40 cwt 
cwt and over 

Total 

pits 

100-499 wage 

earners . 

57 

128 

108 

45 

13 

4 

1 

356 

500 wage earners 
and over 

39 

187 

184 

94 

21 

7 

4 

536 


( 1 ) Compute the average output of coal per man-shift worked for each of the 
two groups of pits. 

(ii) Express the figures m the table as percentages of their respective totals 
and make block diagrams for the two groups of pits on the one sheet of squared 
paper, so that variations can easily be compared. (Inst T , 1945) 

[Ministry of Fuel and Power, Statistical Digest, 1944] 
36. Number of families, on a percentage basis, classified by weekly family 
income and by weekly expenditure on fuel 

(Based on a sample of working-class families before the war ) 

Weekly expenditure on fuel 



Under 

3s - 

4s - 

5s. 

(is- 

7s - 

8s 

9s and 


Family income. 

3s 

3s lid 

4s 11 d 

5s 11 d 

6s lid 

7s llrf 

8s 1 Id 

over 

totals 

Under 40* 

0 5 

2 8 

15 ft 

54 

l 1 


, 


24 8 

40s - 59s 1W 

0 1 

0 7 

1 2 

17 2 

7'4 

1 3 

- 


30 9 

605.- 79s. 1 Id 


— 

0 5 

3 2 

9 7 

2'X 

0 8 


17 0 

80s.- 99s 11 d 


- 

0 2 

0 6 

2 9 

6 >4 

1 5 

. 

11 5 

100s.-U9s lid 




0 2 

1 5 

3 6 

0 4 

0’1 

5-8 

120s. -139s. lid 

— 


- 


0 3 

2 1 

0 9 

0 6 

3 9 

140s ~159s lid 

— 




0 2 

0 9 

1 7 

0 K 

3 6 

1605 . and over 





(>•1 

0 2 

1 1 

1 1 

2 5 

Totals 

0 6 

3 5 

19 9 

26 5 

23 2 

17 3 

6 4 

2 6 

100 0 


Find the mean and median weekly income per family, and the mean and 
median weekly expenditure on fuel per family 

Show graphically, and in figures, the relationship between family income 
and expenditure on fuel. (L U B Com , 1947) 

37 Coal Mines . Great Britain 


Thickness of seam worked (in ft) 




2 and 

3 and 

4 and 

5 and 




Under 

under 

under 

under 

under 

6 and 


Year 

2. 

3 

4 

5 

6 

over 

Total. 



Percentage of total output 



1913 

3*5 

15*3 

26*5 

25*6 

17*5 

11*6 

100 

1924 

4-6 

17*6 

26*9 

23*9 

170 

10*0 

100 

1944 

3-7 

22*0 

29*3 

24*2 

12-7 

8*1 

100 


Coal cut by machines as a percentage of the tonnage 


# 


got in each range of thickness of seam 



1913 

24-4 

16*9 

9*8 

3-7 

3*3 

1*5 


1924 

44*3 

34*5 

21*3 

10*4 

7*9 

4*1 


1944 

84*8 

89*2 

88*1 

77-3 

67*3 

54*4 



[Ministry of Fuel and Power : Statistical Digest, 1945] 





ADDITIONAL EXERCISES ON CHAPTERS I-IV 12$ 

Compute for each year the ratio of coal cut by machines to total toifmage 
for all thicknesses of seam. 

Comment on the changes which have been made in the use of coal cutting 
machinery. 

(L.U. B Sc. Econ., 1947) 


38 Grocers, Provision Merchants and General Food Shops. 

Analysis by Number of Sugar Registrations. 

U K September, 1945 

Number of Shops as Percentage of Respective Totals. 

No of sugar registrations. Multiples Co-operatives. 


1- 99 

6-1 

0-8 

100- 199 

9-6 

2-1 

200- 399 

25-5 

12-0 

400- 599 

21*8 

14-9 

600- 799 

14-6 

13-6 

800- 999 

8-6 

11*4 

1000-1999 

11*3 

32-8 

2000-2999 

1-7 

8-7 

3000 and ovei 

0-8 

3-7 


100-0 100-0 
[Annual Abstract of Statistics, 1935-46] 

Draw cumulative frequency curves, on one diagram, to illustrate the above 
table 

Obtain estimates of the median and quartiles of each distribution from 
your graphs 


39 Average Weekly Earnings m the Brick, Pottery, Glass, Chemical, 
etc , Group of Industries. 

1924 (week ended 18 Oct). 1928 (week ended 27 Oct.). 



No of 

Average 

No of 

Average 

Industry. 

workpeople 

earnings 

workpeople 

earnings. 



s d. 


s. d. 

Brick and Tile . 

53,636 

51 1 

35,548 

51 2 

Pottery 

57,512 

40 4 

48,426 

39 9 

Glass 

31,571 

50 10 

27,448 

51 6 

Chemical . 

61,689 

52 8 ’ 

61,025 

54 5 

Soap, Candles, Oil, etc 

28,835 

50 6 

21,868 

51 3 


233,243 


194,315 



[Annual Abstract of Statistics] 

% 

Calculate the average weekly earnings for the whole group of industries at 
each date. Use two or more sets of weights and compare the results with the 
unweighted averages 
E 



130 INTRODUCTION TO STATISTICAL CALCULATIONS 

40. The average family expenditure per week of a certain group was 79*2 
shillings, of which 46*7% was on food 
The average family was . 



Persons 

Equivalent M men 
per person 

Males over 14 years . 

M 

10 

Females „ 14 ,, 

1*4 

0*85 

Child 5-14 years . 

1*9 

0*65 

„ 0- 5 „ . 

0*5 

0*50 


Compute the food expenditure per family, per person and per “ equivalent 
man " when other persons are expressed as “ men " by the factors above 

(L U. Inter Com ) 


41. Coal Productivity 



Average no 





of shifts 

Average no 




worked per 

of shifts 


Average 


wage-earner 

possible per 


output m 


on colliery 

wage-earner 

Absenteeism 

tons per 


books per 

on colliery 

percentage 

man-shift 


week 

books per 

(overall) 

worked 


(overall) 

week 


(overall). 

1949 Jan 

4*77 

5* 45 


1*14 

Feb 

4*86 

5*58 


1*16 

♦March 

4*82 

5*57 


1*17 

April 

4*49 

5* 12 


1*14 

May 

4*75 

5*37 


1*15 

♦June 

4*63 

5 25 


1*14 


* Average of five weeks 


[Monthly Digest of Statistics, Source Ministry of Fuel and Power] 

(i) Complete the above table 

(n) Assuming that overall absenteeism affects output proportionately 
make an estimate of the amount of coal production lost during February 
(4 weeks) through absenteeism per 1000 men on the colliery books 

42. UK Total Impoits —1949 



Estimated value 

Index no. of 

Index no of 


at 1938 prices 

average value 

volume 

Group 

Gfooo) 

1938 - 100. 

1938 .= 100 

l 

340,968 

284 

79 

II 

231,447 

334 

93 

III 

184,320 

276 

79 


[Board of Trade Journal —Feb 1950] 
I^imate for ©#$Si group the declared values in 1938 and in 1949. 







ADDITIONAL EXERCISES ON CHAPTERS X~IV 


135 


56 

Price Relatives : 

1935 - 

100 


Commodity. 

1935. 

1936 

1937. 

1938. 

1939 

A 

100 

125 

112 

125 

•Iff 

B 

100 

120 

110 

120 

w 

C 

100 

87 

92 

108 * 

m 

D 

100 

75 

125 

150 

140 


The above table gives the price relatives of four commodities for the years 
1936-39 inclusive, the price of each commodity m 1935 being stated as 100. 
Calculate: 

( l ) An index number for each year, 1936-39, using thV simple AE 
of price relatives 

(n) Index numbers for each year, using the chain base method. 

(m) Explain why, m general, the indices of (i) and the chained indices 
are not m close agreement 

57 Data of Question 56 

Recalculate the price relatives, making 1938 the base year. Calculate an index 
number for each year based on the simple arithmetic mean of price relatives. 

58 External Trade of the U K Retained Imports (£mn.) 

Estimated value 



Value as declared 

at 1938 prityNb 


1938 

1948 

1949 

1948. 

fm 

Class I * 

Food, Drink and Tobacco . 

417 8 

873 5 

957*4 

309*3 

330-3 

Class II ■ 

Raw materials, etc . 

2180 

641 0 

737*1 

193*4 

217*0 

Class III : 

Articles wholly or mainly 
manufactured 

215 2 

475*4 

499*6 

172*3 

*180-4 


851 0 

1989 9 

2194*1 

675*0 

734*3 


[Board of Trade Journal] 

Calculate index numbers of volume and average value for 1948 and 1949— 
1938 = 100—for each class separately and for the three classes combined. 

59 Index Number of Retail Prices for a Selection of Food Stuffs . U K. 





July 1914 

1st Sept, 



Purchases 

prices 

1939 priced 

Item. 

Unit 

(units) 

(pence) 

(pence), 

Flour 

6 lb 

1*5 

9 

iii 

Bread 

4 1b 

5*9 

H 

H 

Tea 

lb. 

0*81 

m 

28 

Sugar 

■ ,, 

6 5 

2 

3 

Milk 

quart 

4*8 

H 

tOf 

Butter 

lb 

1-95 

141 

til 

Cheese 

. ,, 

0*80 

81 

10 

Margarine 

, ,, 

0*93 

7 

H 


each 

10*6 

H i 

2 

Potatoes 

7 lb 

2*53 

4J ♦ 

6 i 





136 INTRODUCTION TO STATISTICAL CALCULATIONS 

Compute an index number of retail prices for this selection of foodstuffs at 
1 Sept., 1039 (July 1914 = 100). 

(L.U. B Com.) 

60 Textile Exports of Runtama, 1937 and 1940 

1937. 1940. 

Quantity Value Quantity Value 
Commodity (’000 yds). (£’000). (’000 yds.) (£'000) 

I 256 195-2 300 240 0 

II 312 195-0 350 236-2 

III 450 450-0 510 612-0 

IV 360 405-0 420 577-5 

1245-2 1665-7 

(i) Calculate P 01 , P 10 , Q 01 , Q 10 , where 0 indicates 1937 and 1 1940 
(li) Calculate P 01 ' and Q 01 '. 

(iii) Calculate B<u {a) by the Marshall- Edgeworth formula and (b) by Fisher’s 
“ ideal " formula. 


Imports of Footwear into the United Kingdom 



From Czechoslovakia 

From Switzerland 

From USA. 



Standard¬ 




Price 


Declared 

ized 

Declared 


Declared 

index 


value 

value 

value 

Volume 

value 

(Paasche 


Gfooo). 

(£'ooo). 

(£’000). 

index 

(£’000) 

index) 

1936 

290 

290 

344 

100-0 

222 

100-0 

1937 

368 

419-7 

351 

107-5 

226 

76*36 

1938 

681 

660-8 

367 

109-8 

315 

57*16 


[Working Party Report—Boots and Shoes —data adapted : H M.S O J 


61 Calculate, in respect of Czechoslovakia, price and volume indices for 
the years 1937 and 1938 on 1936 as base 

62 Calculate, in respect of Switzerland, standardized values and price 
indices for 1937 and 1938 with 1936 as base. 

63. Calculate, in respect of the USA, standardized values and volume 
indices for 1937 and 1938 with 1936 as base 

64. The volume and price indices derivable from the data of the above 
table are of the Laspeyre and Paasche types respectively Check your 
answers by making use of the relationship 

Qoi X Pqi' = V 01 

65. Import of Footwear from Hungary into the U.K. 



Declared value 
C£*000). 

Volume index 

Price index 

1936 

40 

100-0 

100-0 

1937 

-- 

150-4 

106-2 

1938 


239-1 

116-0 


[Working Party Report—Boots and Shoes —data adapted : H.M.S.O.] 



ADDITIONAL EXERCISES ON CHAPTERS I-IV 137 

Calculate the declared and standardized values of imports of footwear from 
Hungary into the U K from the above data 

66 See table of Question 33 of this Exercise 

Assuming the populations to be accurately stated and the number of 
persons per dwelling unit to be correct to two places of decimals, calculate 
the maximum and minimum number of dwelling units, consistent with the 
data, (a) m each town, (b) m all the towns combined 

67 Using the data of Question 66 calculate the maximum and minimum 
number of dwelling houses (a) m each town, ( b) m all the towns combined, 
which have more than one person per room, assuming that the percentages 
are stated correct to one place of decimals 

68 Extract from Annual Abstract of Statistics 1937-47 

Estimated quantities 
Average prices in harvested m England 
England and Wales, and Wales, 1947 
1938 (per cwt) (thousand tons) 

s d 

Wheat 9 9 1670 

Barley 9 3 1620 

Oats 7 4 2500 

The total value of all three cereals m 1947 was £143,000,000 Calculate a 
price index number for cereals for 1947 with 1938 as base 

(R S S Certificate. 10491 



CHAPTER V 

DISPERSION AND SKEWNESS 


I. Measures of Dispersion 
(i) Semi-interquartile Range or Quartile Deviation 

-- p.D“.--i(Q3 T o 1 r~" 

Methods of obtaining the quartile values of the variable were 
considered in Chapter I. 

In a symmetrical distribution 


Md. — Qi = Q 3 


Md. 


and the range Md. ± Q.D. includes 50% of the frequencies. 

In any distnbuffon^the range Qy — Qx includes 50% of the 
|pquencies. — 

^Similarly, measures based on quintiles or deciles may be used to 
indicate dispersion. 

(ii) Mean Deviation 

(a) Mean Deviation from the Mean 

The algebraic supa of the deviations from the mean is zero—i.e., 
Z(x = 0, but if the signs of the deviations be disregardedlhen 

-— is called the mean deviation from the mean of the 

1 . n 

distribution. 

(b) Mean Deviation from the Median 

The sum of the absolute values of the deviations is the least 
possible when the median is taken as origin, but mean deviation as 
defined in (a) above is more usually employed. 

Example I. Find the mean deviation of the following numbers: 
7*2, 10*7, 8*5, 6*2, 9*8, 7*4, 5*7, 4*6, 6*5, 7*7, 11*5. " y. V l J 

x » 7-8 ■f 

= Mean deviation from mean. 


Md. 


21* - 7-4 I 


18*0 

11 


1*636 v 


11 11 

=* Mean deviation from median. 
Mean deviation from x or Md = M.D.f* 

138 * 



DISPERSION AND SKEWNESS 




(iii) Standard Deviation 

This is the “ root mean sq uare ” deviation, when deviations are 
measured from^’The mean *o]r the distribution—i,e., standard 
deviation 

S(* - x) z , . . 

—- l f or a simple senes 

n _ 

in the case of a frequency distribution* 



Example II. Find the standard deviation of the series of numbers in 
Example I. 


X 

[x - 7 - 8 ) 

00 

i> 

1 

7-2 

— 0*6 

0-36 

10-7 

+ 2-9 

8-41 

8-5 

+ 0-7 

0-49 

6*2 

- L6 

2*56 

9-8 

+ 2-0 

4*00 

7-4 

- 0-4 

016 

5-7 

-21 

4-41 

4-6 

— 3-2 

10*24 

6*5 

- 1*3 

1*69 

7-7 

-0 1 

0*01 

11*5 

f 3-7 

13*69 


- 9-3 f 9-3 

46-02 



= 2 * 045 . 


(b) To simplify the calculation of standard deviatior 
measure deviations, in the first instance, from an arbit 
Let A = 7. 


A 

(X -1) 

(X -7) 

7*2 

4-0 2 

0*04 

10*7 

4- 3*7 

13*69 

8*5 

4- 1*5 

2-25 

6*2 

-0*8 

0*64 

9*8 

4- 2*8 

7-84 

7*4 

4- 0*4 

0*16 

5*7 

- 1*3 

1 69 


-2*4 

5*76 

6*5 

-0*5 

0*25 

7*7 

4- 0-7 

0*49 

11*5 

4- 4*5 

20*25 


-5-0 4 13*8 

53*06 

x = A 

?(r r- A)^ 

~ — 7-8. 


n \ 

11 


t* 


A 




140 INTRODUCTION TO STATISTICAL CALCULATIONS 


Then mean squared deviation, with origin at 7, = 


But the mean s< 
It can be sho 



deviation from the mean, x — 
Sat 

£L 2 - ~ A ) 2 „ ~~ A 

n [ n 


2(* - 7) 2 __ 53 06 
» 11 
7-8, is required. 


which may be written a 2 = S 2 — c 2 . 

rU 2 53,06 /nflM 

Thus a 2 — —--(0*8) 2 

= 4-1836 

i.e., a — 2-045. 

Note that a 2 is always less than S 2 ; i.e., the mean squared deviation 
is a minimum when deviations are measured from the mean. 

(c) In many cases the simplest procedure is to use arbitrary origin 
zero. The calculation is then as follows * 


; = 85 - 8/11 
715*26 
11 

= 4 1836 


In calculating the standard deviation of a frequency distribution we 
have: — *^2 y//v _ a \2 rY.ff* — A'n 2 


Zf(x - x )* 

S/ 


m * - a ) 2 

2/ 


f S/(* - A) 
l S/ 


c 2 , as before. 


Example III, Find the standard deviation of the numbers 2, 4, 6, 
, 10, 12, using zero as the arbitrary origin 


= (4 + 16 + 36 + 64 + 100 + 144)/6 - 7 2 = — - 49 

6 

- in 

o = 3-42. 

* See page 195 for proof when origin is zero, see also answer to Question 1 
(i), page 318. 



DISPERSION AND SKEWNESS 141, 1 

Example IV. Find the standard deviation of the numbers 176, 172, 


175, 169, 

164, 

179, 

161, 

163, 165, 

173, 174, 

178. 


X. 

(* - 

170). 

(* 

— 170) 2 . 




176 


6 


36 




172 


2 


4 




175 


5 


25 




169 

164 

I 

- 1 
-6 


1 

36 

2 (* - 

170) 

+ 37 — 28 

179 


9 


81 

n 


12 

161 

— 

- 9 


81 



4°7 

16,3 

_ 

- 7 


49 


a 2 

= 4 f£-(0-75)* 

165 

- 

- 5 


25 



173 


3 


9 


a 

= 6 92 

174 


4 


16 




178 


8 


64 





+ 37 

-28 


427 





Example V. Find the mean deviation of the numbers in Example IV 
above. Mean ^ 170 + 0 . 75 ___ 170 . 75 

Mean Deviation from Mean : If the above deviations had been 
measured from 170*75 instead of from 170, those with a minus sign 
would each have been 0-75 greater and those with a plus sign each 0*75 
less. There are 5 with a minus sign and 7 with a plus sign. 

Sum of deviations from mean, irrespective of sign, 

= 37 + 28 + 0-75 (5 -7) r 

= 63*5 
63*5 

Mean deviation — —~ — 5-3. 


Example VI. Calculate (a) the quartile, (b) the mean and (c) the 
standard deviation wages from the following data: 


/eekly wage 

m dollars : 

: 35- 36- 

37- 38- 39- 

40- 41-42 


o of wage-earners . 

14 20 

42 54 45 

18 7 

2/= 200 

w 

(/) 

(* “A). 



(F)- 

Wages 

No of 




1 

($ per 

wage 




Cumulative 

week) 

earners 

(x - 38*5) 

f(x 38-5) /( 

x — 38*5) 2 

frequency. 

35- 

14 

-3 

- 42 

126 

14 

36- 

20 

_o 

- 40 i 

■ 80 

34 

37- 

42 

-1 

- 42 

42 

76 

38- 

54 

— 

-124 

__ 

430 

39- 

45 

+ 1 

+ 45 

45 

175 

40- 

18 

+ 2 

4- 36 

72 

193 

41-42 

7 

+ 3 

til ’ 

63 

200 


200 


+ 102 

428 




142 INTRODUCTION TO STATISTICAL CALCULATIONS 


(a) Quartile Deviation : 


O-i. 

i of the total frequency 

= 50 


up to interval 37- there are 

34 men 


which leaves 

16 


In the next interval (37-38) there 

are 42 


Qi = 37 + if = 

$37-38 

0.3. 

£ of the total frequency 

= 150 


uq to interval 39- there are 

130 men 


leaving 

20 


In interval 39-40 theie are 

45 


Q 3 = 39 + = 

$39-44 


Q.D. = i (39-44 - 37-38) 



= $ 1-03. 



(b) Mean Deviation from the Mean 

Mean - A + — = 3S-5 - -ffo = 38-39 

Sum of deviations from 38 5 E/(# — 38*5) disregarding signs 

- 124 + 102 = 226 

but the sum of deviations from 38-39 is required. 

If the deviations had been measured from 38-39 instead of from 
38-5, "the first three items in the (x ~ 38-5) column would have been 
less by 0-11 each, while the remaining items would each have been 
greater by 0-11, and since each of these items is multiplied by its 
frequency, it is clear that 

2/ \x - x\ = 226 - 0-11(14 + 20 + 42) + 0-11(54 + 45 +18 + 7) 

= 226 + 0-11(48) = 231-28 

231*28 

/. Mean deviation - ——— = $1-1564. Say $1.16. 


(c) Standard Deviation 


S 2 - c 2 
2>f(x ~ 38-5) 2 


E/ 


( 0 * 11) 2 


428 


0-0121 = 2-1279, 


200 __ 
c = V?1279 = $1-46. 


, In the above example <j *. M.D. : Q.D. = 1-46 : 1*10 : 1-03 

i.e., = 1 : 0-79 : 0-71. 



DISPERSION AND SKEWNESS 143 


An approximate relationship amongst the three measures of dispersion 
is given on page 241. 


Calculating in Class-interval Units 

If the interval of a frequency table is not unity—and irrespective 
of whether it is greater or less than unity—it is usually better to 
work in class-interval units, converting the results into the ordinary 
units of the table if and when necessary. This method is illustrated 
in the following example. 

Example VII. Calculate the mean deviation and the standard 
deviation of the following data, giving your results both in class units 


and in ordinary units. 

W- (/)• 

Unit = 5 
(x -37-5). 

f(x -37-5). 

f(x - 37' 

20- 

3 

-3 

- 9 

27 

25- 

15 

— 2 

- 30 

60 

30- 

45 

-1 

- 45 

45 

35- 

57 

— 

~84 

— 

40- 

50 

+ 1 

4 50 

50 

45- 

36 

+ 2 

h 72 

144 

50- 

25 

+ 3 

4 75 

225 

55-60 

9 

+ 4 

+ 36 

144 

In Class Units 

240 

£/(#- 37-5) 

a 2 

+ 233 

233 - 84 „ „„„„ 

- 240 = 0-6208 

= §55 — (0‘6208) ! 

- 2-5104 

695 


a - 1-584. 

In Class Units 


Mean deviation = {(233 -| 



84) + 0-6208(3 + 15 + 45 +57) 

- 0-6208(50 + 36 + 25 + 9)}/240 

3208. 


In Ordinary Units 
S/(#-37-5) 


2 / 


0-6208 X 5 = 3-104 


a 2 2-5104 x 5 2 62-76 

a = 1-584 X 5 = 7-92 


M.D. = 1-3208 X 5 = 6-604 
Mean = 37-5 + 3-104 » 40-604. 


> 



144 INTRODUCTION TO STATISTICAL CALCULATIONS 

Example VIII, Draw carefully a cumulative frequency curve of the 
frequency distribution of Example VI of this chapter. Verify the 
values for the quartiles found in that example, viz. 37-4 and 39-4. 

Verify that the number of frequencies between mean ± a, i.e., 
between x = 36-93 and at -= 39-85 is 138, i.e., 69%. 

Verify also that the number of frequencies between mean ± 2a, i.e., 
between x = 35-47 and x = 41-31 is 190, i.e., 95%. 

It should be noted that in a symmetrical, or moderately skewed, 
distribution the range mean d: will include about 68%, the range 
mean ± 2a about 95% and the range mean ± 3a nearly 100% of 
the frequencies. The reader should make a similar test on other 
suitable distributions. 

Calculation of Standard Deviation when the Variate has both 
Positive and Negative Values 

Example IX. Find the standard deviation of the following senes. 

Values of x.- 2-0, — 1*4, - 1-4, — 0 6, 4 0-2, -f 0-4, 4 TO, 
4- 1-4, 4- 2-0, -f 2-4 4- 3-0, 4- 3-2, 4 3-6, 4 3-6. 

■4* =r- (20-8 - 5-4)/14 =• 11 

(#-*): - 3-1, -2-5, -2-5, — 1-7, -0-9, -0-7, 

-01, -| 0-3, -f 0-9, +1-3, +1-9, +2-1, +2-5, 
+ 2-5 

2(* - x)* --- 49-42_ 

49-42 

“IT 

Hat 2 

As an alternative method, use the formula a 2 ■= — — x 2 . 

n 

Calculation of a in Class Interval Units when the Unit is Fractional 
Example X. 

(A). 

1 - 2 - 

1 - 4 - 
1 - 6 - 
1 - 8 - 

2 - 0 ” 

2 - 2 - 
2-4- 

73 + 48 164 


Unit - 0-2. 


()) 

(x - 1-9) 

/(* ~ 1*9) 

/(A - 1- 

3 

-3 

- 9 

27 

7 

— 2 

— 14 

28 

13 

-1 

—13 

13 

21 

— 

^36 

__ 

15 

+ 1 

4 15 

15 

9 

4-2 

4-18 

36 

5 

4-3 

415 

45 




DISPERSION AND SKEWNESS 


US 


2 /(* - 1 * 9 ) 
2/ 


0-1644 


/164 
V 73 ' 


(0-1644) 2 


in ordinary units 


= 1-49 

a = 1-49 X 0-2 
« 0-298. 


Coefficients of the Measures of Dispersion 

The quartile, mean and standard deviations in Example VI were 
stated in dollars. If the variable in Example VII be “ years of 
age ”, then the measures of dispersion are also years. To compare 
the variability of distributions, not expressible in the same unit, it 
may be helpful to divide a similar measure of dispersion in each by 
the origin to which it is referred, thus converting them into 
“ coefficients ” which are pure numbers, i.e., not dimensions, etc., 
and may therefore be directly compared. 


(a) Coefficient of quartile deviation = 

m M.D. 

(b) ,, mean ,, = vv - or 

(c) ,, standard ,, 


Mean 

a 


Jt.IT 

Median* 


Mean* 

(c) is generally stated in the form jj-—, when it is known as the 
” coefficient of variation 


Given the means , frequencies and standard deviations of two or more 
subgroups , to find the mean and standard deviation of the whole group . 


Group A 

Group C Group (A f B) 


7? 2 n — ? 


X % X — ? 


cr 2 a = ! 

— n x -f w 2 , 

. _ -f n 2 f 2 


»i + w 2 


Let | x — x x | — c x and | x — x 2 1 = c v 
Then H(x — x) 2 for the A group when measured from % instead 
of from x x will be n 1 a 1 2 + n x cf* t and for the B group it will be 
» 2 a a 2 + w a c 2 2 . 

no 2 ~ nfaf 2 + c x 2 ) + n 2 {<s % 2 + c 2 2 )* 



146 INTRODUCTION TO STATISTICAL CALCULATIONS 


This formula can be expanded to include any number of groups : 

thus no 2 = n^ 2 + c 2 ) + tt 2 (a 2 2 + c z 2 ) + n 3 (a 3 2 + c 3 2 ) + . . . 
where n = n t + n 2 + n 9 + • * • 

It can be written briefly as no 2 = 2n t (a t 2 -f- c t 2 ), where 
* = 1, 2, 3_ 


Example XI. Find n, x and a, given that 

n x = 55 x x = 6*4 a x - 1*23 

w 2 = 45 # 2 ~ 6*6 a 2 = 1*62 

« = 100 

* = {(55 X 6*4) + (45 X 6*6)}/100 - 6*49 

100a 2 = 55{(1*23) 2 f (0*09) 2 } + 45{(l-62) 2 + (0*11) 2 } 
a 2 - (83*66 + 118*64)/100 = 2*023 
a ~~ 1*42 


The standard deviation of the first n natural numbers is needed for 
some calculations. The derivation of the formula is given, as a 
useful exercise, below. 

n ] 1 

The mean of the first n numbers = —^—• 

The sum of the squares of the numbers 


1 to n •= 


n(2n + 1 )(n + 1) 
_ . 


Using zero as origin 

a 2 =— 2 

n 


(*)2 _ ”( 2 « + !)(»+!) _ / 

= {2(2m + l)(n + 1) 

= (4m + 2 - 3m - 3)(m + 1)/12 

1 


M + 1\ 2 


3(m + 1)(M + 1)>/12 

(«— ])(» +1) 
12 




12 


Thus the standard deviation of the numbers 1 to 10 inclusive is 

2-87. 


J 


10 2 - 1 


12 


Method of Summation 


The mean and standard deviation of a frequency distribution can 
be obtained by a method of summation as follows (see Question 27, 
Exercise 1). 



DISPERSION 'AND SKEWNESS 


147 


Example XII. 


( 1 ). 

(n) 

(ui). 

(iv). 

X. 

/• 

1st Cumulation. 

2nd Cumulation. 

15- 

2 

2 

. 2 

20 - 

5 

7 

9 

25- 

8 

15 

24 

30- 

11 

26 

50 

35- 

15 

41 

91 

40- 

20 

61 

152 

45- 

20 

81 

233 

50- 

17 

98 

331 

55- 

16 

114 

445 

60- 

13 

127 

572 

65- 

11 

138 

710 

70-75 

5 

143 

853 


143 

853 

3472 


Let I be the mid-point of the last interval with a frequency entry; 
in this case I = 72-5. 

Let w be the width of interval in the table; in this case w — 5 . 


Let 

and 


Sum of col. 


s/ 

Sum of col. 


S/ 


Then mean 


Jin) 

(iv) 


= f 43 - F, = 5-96504 

= ~ = F 2 = 24-27972. 

-= I - w(F! - 1) 

- 72-5 - 5(4-96504) 

= 47-6748 


and standard deviation — w\/ 2 F 2 — F t — F x 2 

-== 5a/48-5594 - 5*96504 - 35-5817 
= 13-2405. 


The method used here is equivalent to taking the mid-point of the 
interval beyond the last one with a frequency entiy as the arbitrary 
origin and proceeding in the usual way, as the following calculation 
will show. 


Unit 5 


(*). 

(/) 

(x - 77*5). 

f(x -77-5). 

f(x -77*5)*. 

15- 

2 

-12 

- 24 

288 

20 - 

5 

- 11 

~ 55 

605 

25- 

8 

- 10 

- 80 

800 

30- 

11 

- 9 

- 99 

891 

35- 

15 

- 8 

-120 

960 

40- 

20 

7 

- 140 

980 

45- 

20 

- 6 

- 120 

720 

50- 

17 

- 5 

-- 85 

425 

55- 

16 

- 4 

- 64 

256 

60- 

13 

- 3 

- 39 

117 

65- 

11 

- 2 

- 22 

44 

?0~75 

5 

- 1 

- 5 

_5 


143 


— 853 

6091 



148 INTRODUCTION TO STATISTICAL CALCULATIONS 


t i , . . ... £/(* - 77-5) 853 

It will be noted that ———- - =-- = — F x 

2jJ 143 

Also E/(* - 77-0)* = 6091 = 2NF. - NFi 

S/e - 77-IO> _ Fi 


whence 

i.e., 


S/(* ~ 77-5) 

S/ “ 


S/ 


/853\ 2 
\ 143/ 


2F, 


F _ F 2 

i i — i i 


If the summation be performed upwards, then the sum of the 1st 
cumulation is 1006, and of the 2nd, 4543. Thus 


and 


F i = ~ - 7-034965 

p2 = W =31 ' 76923 ' 


I now becomes the mid-point of the 1st interval which has a frequency 
entry and w(F l — 1) must be added thereto. 

mean - 17-5 + 5(7*035 - 1) 

= 47*675 

and a - 5(63*5385 - 7*0350 - 49*4905}^ 

- 13*24. 


Adjustment of Examination Marks 

The original marks, usually expressed as percentages, are known 
as the “ raw scores ”. 

Jf a number of candidates each write papers in n different subjects, 
then it is inequitable simply to total raw scores to place the 
candidates in order of merit. The fact that the marks in the 
different subjects will probably have different amounts of spread 
introduces “ weights ” which will be to the advantage of those 
gaining high marks in subjects where the spread is great, and to the 
disadvantage of those gaining high marks in subjects where the 
range is narrower. 

One way of adjusting matters is to express the raw scores in 
“ standard measure Thus, if x v x 2> . . . x„ are the marks in 

mathematics, e.g., the adjusted marks become %x and 

so on. The raw scores in the other subjects are similarly adjusted, 
and the Z scores, as they are sometimes called, of each candidate 
may then fairly be added together to determine order of merit. 



DISPERSION AND SKEWNESS 


Example XIII. In an examination candidates A, B and C obtained 
35, 57 and 74 marks respectively in a certain paper. Compute the Z 
scores, given that the average mark for the paper was 52, and the 
standard deviation of the marks was 11. 


35 - 52 _ 17 57 — 52 

Tl 11 *11 

- - 1*54 


0-45 C: 2-0. 


In order to avoid negative marks, an average and standard deviation 
are assumed, say 50 and 10, giving a range of marks of about 20-80 
and T scores calculated thus: 

A; 10(35-52) +W _ S4 . 6 

B: 10(0-45) + 50 = 54-5 

C: 10(2-0) + 50 — 70-0 

Example XIV. 

Raw Marks in 

Candidate. English. Mathematics Total. 

A 84 75 159 

B 74 85 159 

Average for English 60 with standard deviation 13. 

„ Maths. 50 „ „ 11 

Which is the better performance, A’s or B’s ? 


Z scores. 


84 - 60 
13 

75 - 50 
11 


Z scores. 


13 

85 - 50 


i.e., B’s is the better performance. 


Formulae 

I. £(% — x) 2 or lif(x — x) 2 = sum of squared deviations from the 
mean: this is referred to briefly as " Sum of Squares 

II. X(x — A) 2 or 2/(# — A) 2 = sum of squared deviations from 
an arbitrary origin: it is referred to briefly as “Crude sum of 
squares 

III. - h(x —- x) 2 = the variance = o 2 . 



150 INTRODUCTION TO STATISTICAL CALCULATIONS 


Formulae for obtaining I and III can be expressed in a variety of 
ways, and the reader is advised to use, in any particular calculation, 
the formula best suited for the purpose, and not to adhere slavishly 
to one method. 


When the Arbitrary Origin is Zero 

S(* - x) 2 - £* 2 - 
= Sr 2 - 



(i) 

(ii) 


6 36 

7 49 

8 64 

5 25 

4 16 

30 190 


E(x - x) 2 - 190 - 5(6) 2 = 10 from (i) 
— 190 — - ~ - 10 from (ii) 


When the Arbitrary Origin is not Zero 


The reader 

is accustomed 

to the 

formula for the variance, i.e., for 

a 2 , viz. 









i 

n 

£(* - 

- xy = - 

' n 


A) 2 - 

fS(* - A)! 2 

l n ] 


Multiplying throughout by 

n we get: 





2 (* 

-x) 2 ^ 

2 (x - 

A) 2 - 

m* - a )} 2 

n 

■ M 




= 

E(* - 

A ) 2 - 

n(x — A ) 2 

• (iv) 

x. (x 

- 160) (x - 

- 160) 2 





161 

1 


1 



31 


163 

3 


9 


x = 

160 + ~ = 164-42857 

167 

7 


49 





162 

164 

2 

4 


4 

16 

2 (a - 

X) 2 

IV, - 

from (iii) 

168 

8 


64 


=r 

41-7143 


166 

6 


36 


= 

179 - 7(4-42857)2 

from (iv) 


31 


179 



41-7143. 



The Lorenz Curve 

This diagram shows effectively how the actual distribution of 
capital, or income, for example, in a population departs from 
“ equal distribution 



I 

DISPERSION AND SKEWNESS 161,, 


Example XV, 

Draw a Lorenz curve to illustrate the following table; 


No. of 

% of 

Group 

%of 



Range of 

persons 

total 

income 

total 

Cumulative % of 

income. 

(’000) 

numbers. 

(£000). 

income. 

numbers. 

income. 

£100- 

2667 

54-21 

731,500 

26-60 

100-00 

10000 

£250- 

1872 

38-05 

940,500 

34-20 

45-79 

73-40 

£500- 

253 

5-14 

357,500 

13-00 

7-74 

39-20 

£1000- 

82 

1-67 

233,750 

8-50 

2-60 

26*20 

£2000- 

34 

0-69 

195,250 

7*10 

0-93 

17-70 

£5000 and over 

12 

0-24 

291,500 

10 60 

0-24 

10-60 


4920 

100-00 

2,750,000 

100-00 




LORENZ CURVE 



PER CENT OF NUMBERS OF INCOMES 


Fig. 7. 



152 introduction to statistical calculations 

Cumulative percentages of numbers of incomes are plotted on the 
horizontal scale and cumulative percentages of total income on the 
vertical scale. The dotted diagonal line is a line of equal distribution 
of income—i.e., 20% of persons get 20% of total income, etc.—and the 
departure of the Lorenz curve therefrom illustrates the inequality of the 
actual distribution: 

e.g., 0*24% of all persons have 10*6% of total income 

0-93% „ „ 17-7% „ , etc. 

The cumulation in the above table is on the " more than " plan; i e., 
100% of numbers have /100 or more If it had been done on the 
" less than " plan, the graph of the cumulative percentage columns 
would have given the Lorenz curve in the lower right-hand half of the 
square, instead of in the upper left-hand half, but the interpretation 
would, of course, have been the same. 

II. Skewness 

In a symmetrical distribution mean, median and mode coincide. 
In a skew distribution the three averages have different values. If 
a distribution is negatively skewed, then x < Md. < M 0 ; if it is 
positively skewed, then M 0 < Md. <, x . It has already been pointed 
out that in a moderately skewed distribution the approximate value 
of the mode may be obtained from the empirical relationship 

M 0 — x — 3(* — Md.) 

Measures of Skewness 



Sk _ (Q a - Md.) - (Md. - Q,) 

() i(Qa-Qi) 

_ 0 * ± Qi ~ 2Md - 
“ 2Q.D. 

Both (i) and (ii) are zero when the distribution is symmetrical. 
Also each of them is a pure number, since in each case the numerator 
and denominator have the same dimensions. Values of Sk. derived 
from: 

(ii) lie between — 1 and + 1, but there are no theoretical 
limits to values of Sk. derived from (i); 

(iii) A third measure of skewness is given on page 197. 



DISPERSION AND SKEWNESS 


153 


Exercise 5 

1 Find by direct calculation the standard deviation of the numbers 1 to 
13 inclusive 

2. Calculate the standard deviation (a) of the first 13 odd numbers, (h) of 
the first 13 even numbers. 

3. Calculate the mean deviation and the standard deviation for each of the 
following : 

(a) Chapter I, Example VII 

(b) „ „ XI 

(c) „ „ XIII 

4. Calculaie the standard deviation for each of questions 1, 2, 3, 4 and 5 
of Exercise I. 

5. Calculate the coefficient of variation for each of questions 11, 12 and 17 
of Exercise I 

6 Calculate the mean deviation and the standard deviation for each of 
questions 24, 26 and 27 of Exercise I. 

7. Obtain a formula for the standard deviation of the first n odd numbers, 
given that the sum of l 2 , 3 2 , 5 2 . . .to n terms is 

n{2n -f 1)(2 n - l)/3 

8 Eight coins are tossed 256 times. The theoretical frequencies of 0 head, 

1 head . . 7 heads, 8 heads per tossing are given below Calculate the mean 

number of heads per tossing and the standard deviation. Verify that 

a = $V8. 

No of heads (x) : 0 I 2 3 4 5 6 7 8 

Frequency (/) . 1 8 28 56 70 56 28 S 1 

Verify that the frequencies are given by the terms of the expansion of 

2«(i + *)■ 

9 Value of variable: 20- 30- 40- 50- 60- 70- 80- 90-100 

Frequency. 3 14 46 58 49 37 26 10 E/= 243 

(a) Compute the mean deviation and the standard deviation of the 
above distribution. 

(b) Estimate, by interpolation on the ogive, the number of items with 
values of the variable between ( 1 ) 45-55, (u) 55-65, (m) 65-75 

10. For a certain group of wage-earners, the median and quartile wages 
per week were $44*3, $43-0 and $45-9, respectively Wages for the group 
ranged between $40 and $50. 10% of the group had under $42 per week, 

13% had $47 and over and 6% $48 and over. Put these data into the form of 

a frequency distribution, and hence obtain an estimate of the mean wage 

and the standard deviation. 

31. Calculate the mean and standard deviation of the following distribu¬ 
tion. Draw an ogive of the distribution and use it to determine the proportion 



154 INTRODUCTION TO STATISTICAL CALCULATIONS 


of frequencies between (a) x d= <r, (b) % ± 1 *5cr, ( c) £ ± 2c r, (d) $ ± 2‘5a, 
(e) $ -f- l-5cr and x -f a, (f) % — l*5a and x — a. 

x : 2*5- 7-5- 12*5- 17-5- 22*5- 27-5- 32-6- 37*5- 

/: 12 28 65 121 175 198 176 120 

x : 42-5- *47-5- 52*5- 57*5-62*5 

/: 66 27 9 3 E/= 1000 

12. Ten coins are tossed 1024 times. The theoretical frequencies of 10 
heads, 9 heads, . 1 head, 0 head are given by the expansion of 2 10 (1 -f -J-) 10 . 



/ 


No. of heads 

per tossing 

Frequency 


0 

1 

* 

1 

10 


2 

45 


3 

120 


4 

210 

Calculate the mean number of 

5 

252 

heads per tossing, and the 

6 

210 

standard deviation 

7 

120 


8 

45 


9 

10 


10 

1 

1024 


Four dice are tossed 1296 times 

The theoretical frequencies of 

. 3 sixes, 4 sixes are given by expanding 6 4 (£ T -|) 4 

j*r. 

/ 


No. of sixes 

per tossing. 

Frequency 


0 

625 


1 

500 

Calculate the mean number of 

2 

150 

sixes per tossing and the 

3 

20 

standard deviation 

4 

1 



1296 


14. A frequency distribution is m the form of an isosceles triangle, base 40, 
height 30. The mean — median ~ 20 : total frequency = area = 600 
Show that the quartile deviation is 20 — 20/V2 = 5*86. 

15. From the data below, giving the averages and standard deviations of 
four sub-groups, calculate the average and the standard deviation of the 
whole group. 



DISPERSION AND SKEWNESS 


155 , 





Standard 

Sub-group 

No of meu. 

Average wage. 

deviation wage. 



s. d. 

5 . d. 

A 

50 

61 0 

8 0 

B 

100 

70 0 

9 0 

C 

120 

80 6 

10 0 

D 

30 

83 0 

11 0 

Total: 

300 




(L U B Sc. Econ., 1932) 


10 The first of two sub-groups has 100 items with mean 15 and standard 
deviation 3 If the whole group has 250 items with mean 15-6 and standard 
deviation = V 13*44 find the mean and standard deviation of the second 
sub-group. 

17. Let a frequency distribution be represented by the portion of the 
curve, y s= 8x — x 2 , above the axis of x, i.e., for values of x, 0 ^ x ^8. 

Take values of y at x = 0, 1, H, ■ ■ ■ 8 to represent frequencies and 

compute the standard deviation and the mean deviation; verify that 
a . M.D. 1 : 0-84. 

Note area of distribution = ■§ area of circumscribing rectangle 
2 

18 Show that the standard deviation of the first n even numbers is the 
same as the standard deviation of the first n odd numbers, given that the 
sum of 2 2 «+• 4 2 + 6 2 , . . . -j- n 2 is 2n(2n -f l)(w -j- l)/3. 

19. A group has x — 10, n — 60, a 2 = 4. 

A sub-group of the above has x x = 11, = 40, <j x 2 — 2*25 Find the 

mean and standard deviation of the othei sub-group 

20. Recalculate, for any of the above distributions, the mean and the 
standard deviation by the summation method. 

21. Three candidates in a certain examination had the same aggregate 
marks and were bracketed equal. Use the following data to determine 
whether or not this placing was equitable. 


Marks awarded out of 100 for 


Candidate. 

English 

Science 

Mathematics 

Totals. 

A 

95 

70 

61 

226 

B 

69 

83 

74 

226 

C 

70 

74 

82 

226 


The mean marks were 55 for English, 53 for Science and 50 for Mathematics, 
and the standard deviations, in the same order, were 16, 12 and 11. 

22. If four ordinary dice are tossed 1296 times and the “ spots " on the 
upturned faces summed, then the theoretical frequencies of sums of 4, 5, 6, 
. . . 23, 24 are as shown m the tabic below : 

Sums. 4 5 6 7 8 9 10 11 12 13 14 15 16 

/: 1 4 10 20 35 56 80 104 125 140 146 140 125 etc. 

£/— 1296 



156 INTRODUCTION TO STATISTICAL CALCULATIONS 


Verify that mean 
where 

and that 

23. A group of 100 items has a mean of 00 and a variance of 25. If the 
mean of 50 of these items is 61 and its standard deviation 4-5, find the mean 
and variance of the other 50 items Write down the " sum of squares ” of 
the whole group and of each sub-group 

24. The sum of 10 numbers is 60 E(# — a) 2 =-- 36 Find Sr* and 

E(# — 5) 2 . 

25. Given 2* - 99, n - 9, S(r - 10) 2 = 79, find £r 2 and S(* - *) 2 . 


*5= 3 5n, 

n « no of dice — 4 
, 3 5n 

12 


26. 


Weights of Adult British Males 


Weight (lbs.). 

Percentage fre< 

90- 

002 

100- 

0-44 

110- 

1*96 

120- 

5*03 

130- 

11*19 

140- 

20*94 

150- 

20*12 

160- 

17*12 

170- 

10*15 

180- 

6*14 

190- 

3*39 

200- 

1-38 

210- 

1-13 

220- 

0*53 

230- 

0*20 

240- 

0*14 

250- 

0*10 

260- 

0*02 


100-00 


Calculate the mean and median weights. Calculate also the quartile and 
standard deviations 

Assume that weights were taken to the nearest lb. 

27. Age group 

(years): 10- 12- 15- 21- 25- 30- 35- 45 55- 65-69 

% frequency: 2-3 3-4 15-5 12 0 18-6 16-3 14-9 9-6 5-4 2-0 S/ = 100 0 

Calculate the mean age and the standard deviation. Calculate also the 
quartile deviation. 

28. Calculate the quartile, mean and standard deviations of the distribution 
of Question 2 (Additional Exercises, p 119). Obtain a measure of skewness 
from the formula 



a 



DISPERSION AND SKEWNESS 157 

29. Calculate the mean deviation and the standard deviation di the 
distribution of Question 4, of the same Exercises. 

30. From the data of the bivariate table of Question 7 of the same Exercises 
calculate 

(i) the mean number of persons in family and the standard deviation, 
(n) the mean number of rooms per family and the standard deviation, 
and 

(iii) the mean deviation of each distribution 

31. Calculate the standard deviation of the distribution of Question 9 
(Additional Exeicises) 

32. Calculate the standard and mean deviations of the distribution of 
candidates’ marks—Question 22 (Additional Exercises) 

33 Calculate the mean deviation, the standard deviation and a measure of 
skewness of the distribution of schoolboys' heights—Question 24 (Additional 
Exercises). 

34 Calculate the mean deviation and the standard deviation of (a) the 
numbers of earners and (h) the numbers of non-earners of the table of 
Question 34 (Additional Exercises). 

35. Area of “ parloui " in the homes of 398 working-class families, analysed 
by whether uncarpeted or carpeted. 


Area of parlour 
(sq ft) 

Number 

Uncarpeted 

Carpeted. 

Unclassified 

5 

3 

45-64 

2 

2 

65- 

19 

16 

85- 

56 

56 

1 OS- 

74 

39 

125- 

32 

22 

145- 

10 

8 

165- 

14 

10 

185- 

6 

7 

205- 

— 

1 

225- 

2 

2 

245- 

3 

2 

265- 

2 

— 

285- 

2 

— 

305- 

1 

1 

325- 

— 

— 

345-364 

1 

— 


229 

169 


[Working Party Report . Carpets , Source: 
Social Survey Report, 1946, H.M.S.O ] 

Calculate (i) the mean parlour area and (ii) the standard deviation for each 
of the above distributions. Disregard the “ unclassified ” entries. 



158 INTRODUCTION TO STATISTICAL CALCULATIONS 


36 Proportionate Age Distribution of Insured Persons m the Boot, 
Shoe and Clog Industry in Great Britain at July > 1937 


Age (years) 

% males 

% females 

14- 

72 

11 0 

16- 

87 

13 2 

18- 

7 8 

11 7 

21- 

10 7 

154 

25- 

11 4 

13 7 

30- 

12 2 

12 3 

35- 

9 6 

85 

40- 

7 8 

59 

45- 

70 

42 

50- 

6 7 

23 

55- 

5 6 

1 2 

60 64 

53 

06 


100 0 100 0 
Calculate, both for males and for females 
(i) the mean age and the median age, 

(n) the standard deviation and the quartile deviation 

{Working Paity Report v Boots and Shoes, II M S O ] 


37 Chest girth and Height of Males Aged 20 Examined b\ Medical 
Boards, Great Butam, 1939 

Chest-girth Under 41 and 


(ms) 

31 

31- 

33 

35 

37 39 

over 

Total 

° a cases 

09 

Under 

6 6 

28 5 

40 0 

19 2 1 9 

0 9 

73 and 

100 0 

Height (ms ) 

61 

61 

63- 

65- 

67- 69- 71- 

over 

Total 

% cases 

0 7 

3 4 

12 4 

26 2 

29 6 19 0 6 8 

1 9 

100 0 


[Source Martin , I he Physique of \ oung Adult Males 1949] 
Estimate the standard deviation of each distribution Would you say that 
height is a more variable characteristic than chest measurement amongst 
these males > (LU BSc Econ , 1950) 


38 


Retailei s of Chocolates and Sugar Confectionery 
(Analysis by size of outlet Nov 1945) 

No of four-weekly rations sold No of theatres and cinemas 


Under 50 * 

697 

50 99 

398 

100- 199 

391 

200- 399 

333 

400- 599 

100 

600- 799 

30 

800- 999 

12 

1000-1499 

14 

1500-1999 

9 

2000-2499 

2 

2500 2999 

1 

* Take mid point as 37 5. 

1987 


Annual Abstract of Statistics, 1937-47, Source Ministry of Food] 



DISPERSION AND SKEWNESS 159 


Calculate the mean and median number of four-weekly rations Calculate 
also the standard deviation, 

39 Depth of Shaft and Output Got 

(Expressed as tons of coal m each 10,000 tons got m 1944) 


Average lowest 
winding depth of 

North Midland 

North-we 

shaft (in 

yards) 

region 

region 


Under 50 * 

49 

44 

50 and under 150 

588 

273 

150 


„ 250 

1,710 

961 

250 


„ 350 

2,609 

1,593 

350 


, 450 

1,605 

2,017 

450 

t » 

„ 550 

1,864 

1,419 

550 

>> 

„ 650 

911 

1,640 

650 

9 f 

,, 750 

377 

692 

750 

> t 

„ 850 

81 

936 

850 

9 f 

, 950 

206 

221 

950 


„ 1050 

— 

204 




10,000 

10,000 


* Take mid-point as 25 yards 

[Supplement to Statistical Digest, 1945 Ministry of Tuel and Power] 

Calculate for each region 

(i) the mean and median depths of shaft 
(n) the standard and qnartile deviations 
(in) measures of skevness based on the results of (n) 

40 In the manufacture of a certain scientific instrument great importance 
is attached to the life of a particular critical component (x) This component 
is obtained in bulk from two sources (A and B), and in the course of inspection 
the lives of 1000 of the component (x) from each source are determined and 
the following frequency tables arc obtained 

Source \ Source B 

No of No of 

Life (hours) components Life (hours) components 
1000-1020 40 1030-1040 339 

1020-1040 96 1040-1050 136 

1040-1060 364 1050-1060 25 

1060-1080 372 1060-1070 20 

1080-1100 85 1070-1080 130 

1100-1120 43 1080-1090 350 

Examine the effectiveness of the measures of dispersion with which you are 
familiar for comparing the dispersions of the two distributions. 

(R S S Certificate, 1951) 






CHAPTER VI 


LINES OF “BEST FIT”. CORRELATION 
I. Straight Lines of Best Fit 

/When data relating to two variables are plotted on squared paper 
V a straight-line trend is often suggested by the diagram; i.e., the 



Fig 8 


plotted points lie approximately in a straight line. The question is, 
which straight line best expresses the relationship between the 
variables ? 

Let the observed data be represented by 

Values of N: X lf X 2 . . . X n 
Y: Y v Y 2 . . . Y n 

Suppose the straight line of best fit to be as shown above. Then 
the vertical distances between the plotted points and the straight 
line are the differences between the observed and the calculated 
values of Y; calculated, that is, from the equation of the line of best 
fit. 



LINES OF “ BEST FIT CORRELATION 161 

One definition of a fine of best fit is that it must be such that the 
sum of the squares of these differences is the least possible. 

The equation of a straight line may be written Y = a -f- bX. 
The problem is to determine values of the constants a and b so that 
the sum of (Yj — a + 6X x ) 2 + (Y 2 — a + 6X 2 ) 2 + . . . shall be 
a minimum. 

Since there are two unknowns, a and b, two equations are necessary 
to determine their values. 

These two equations, known as the Normal Equations, whose 
derivation is explained on page 181, are 


na + bZX = £Y 
uSX f 62X 2 = SXY 


• W 

. (ii) 


where n is the number of pairs of observations 
The appropriate values of a and b are found by substituting in the 
normal equations the values of EX, EX 2 , XXY calculated from the 
data and then solving the resultant simultaneous equation. 


Example I. Find the equation of the straight line which best fits the 
following data. 


X 

X 2 

Y 

XY 

Then 44 - 

4 a - 1 - 24 b (l) 

3 

9 

5 

15 

306 - 

24 a f 164 b (n) 

5 

25 

9 

45 

which gives 


7 

49 

12 

84 

a 

- - 1 6 

9 

81 

18 

162 

b = 

= 2-1 

— 

— 

— 

— 

and the required equation is 

24 

164 

44 

306 

Y - \ 

MX - 1-6 


V 0 

Y. 


d 

tf 2 



Y obtained 



Observed Y. 

from the eqn 

(Y 0 - Y c ) 

(V 0 - Y c )». 


5 

4-7 


+ 0*3 

0-09 


9 

8*9 


4- 01 

0*01 


12 

131 


- M 

1*21 


18 

17*3 


+ 0-7 

0*49 


F 


T8Q 



162 INTRODUCTION TO STATISTICAL CALCULATIONS 


£(Y 0 — Y c ) a is a minimum: i.e., any change in "a and b from — I* 
and 2-1, respectively, would give a value for Zd % greater than 1-80. 




)le, a — • 

- 1*59, b =- 

2*09. 

Y 0 . 

Y c 

d 2 

5 

4*68 

0*1024 

9 

8*86 

0 0196 

12 

13*04 

1*0816 

18 

17*22 

0*6084 


1*8120 = Zd* 


It can easily be shown that the line of best fit passes through th< 
point (X, Y), i.e., (6, 11) in the above example. If this point 
instead of the point (o7 OjTbe taken as origin—which ls simj^b 
equivalent to taking deviations of X and Y from their respect jvi 
means instead of the original data—the normal equations become, 


Ly = na + bZx .(i) 

Zxy = aZx + bZx 2 .(ii) 


where x = X — X, and y = Y — Y , since Zx ~ Zy -- 0 

. equation (i) reduces to na - 0, a = 0 

ZtXX 

equation (ii) reduces to Zxy ~ bZx 2 , b — . 

Therefore the equation to the straight line of best fit referred t< 

4 V & ori & in ' is y = §? *•. 

1 Using the same data we now have: 


i.e., 


X 

X 

x 2 

Y 

y 

xy. 

3 

- 3 

9 

5 

~ 6 

18 

6 

- 1 

1 

9 

- 2 

2 

7 

+ 1 

1 

12 

+ 1 

1 

9 

+ 3 

9 

18 

4- 7 

21 



20 



42 




42 

— 2*1 





= 20 





y = 2'1*. 





Then to obtain the equation with (0, 0) as origin write 

Y-f-g(x-x) y 

Y- 11 = 2-1 (X - 6) 

Y = 2-1X - 1-6 as before. 




JANES OF “BEST FIT COKKELATION H3 


Example II. Fit a straight line to the following data by the method 
of " least squares 





o 


Year : 

1929 1930 

1931 1932 1933 

X 

% of insured work-people 




unemployed : 

11-3 13*0 

9-7 10-6 10-7 

Y 

In a time series it is 

often advisable to call the mlddlejymr 

jaLthe 

seriggJI, TTIi'c"X VaTueslKen become "—’2’ 



X 

X 2 . 

Y 

XY. 


__ 2 

4 

11-3 

- 22*6 


- 1 

1 

13 0 

13*0 


0 

0 

9*7 

— 


1 

1 

10*6 

-f- 10*6 


2 

4 

10 7 

+ 2T4 


0 

10 

55*3 

35 6 -f- 32*0 


Then 

XY = 

na 




55*3 = 

5a 

a = 11*06 



XXY - 

bZX 2 




- 3*6 = 

10 b 

b =r - 0*36 



Y - 

- 0-36X 

f 11 06. 

(«) 


Suppose, however, we had proceeded by calling 1929 1, thus having 
X values 1, 2, 3, 4, 5, we should have obtained 

Y = - 0'36X + 12*14.(6) 

The difference in the value of a, i.e., the intercept on the axis of Y, is 
due to the fact that in (a) the origin is year 1931, and m (b) at 1928. 

Find the " expectecHWalue of the unemployed percenfage"lbFT033 
from both of the above equations. 

(a) Y — — 0*36(2) + 11*06 - 10*34. ; / 

(b) Y — — 0*36(5) + 32*14 = 10*34. r 

If, in calculating lines of best fit, the deviations of X and Y are 
measured in the first instance from arbitrary origins A and B, 
respectively, certain corrections will have to be made. As has 
already been shown 

X* 2 , i.e., S(X -X ) 2 = X(X - A ) 2 - »(X - A ) 2 
£y 2 , i.e., S(Y - Y ) 2 « S(Y ~ B ) 2 - n(t - B) 2 . 



164 INTRODUCTION TO STATISTICAL CALCULATIONS 

The correction from E(X — A)(Y — B) to E(X — X)(Y — Y), 
i.e., Xxy, is now needed. 

Zxy - S(X - X)(Y - Y) = 2XY - XSY - YsX + nXY ’ 
= SXY - nXY - nXY + nXY 
= SXY - «(XY), when A = 0, B = 0. 

When A and B are not zero, let | X — A | = c, and | Y — B | — c 2 . 
Then S xy = S(X - A)(Y - B) - w( Cl c 2 ). 

If either c x or c 2 is zero, the correction «(r 1 c 2 ) is, of course, zero. 
From the above we now have: 

S xy E(X - A)(Y - B) - n(c,c 2 ' 

Sa- S(X - A)" - nc^ ~ = ' 

Example III, Determine the equation of the straight line which 
best fits the following data 



X: 

10 12 

13 

16 

17 20 25 



Y: 

19 22 

24 

27 

29 33 37 


X. 

(X - 16) (X 

- 16) 2 

Y (Y 

-27) 

(X - 16) (Y - 27) 

(Y - 27)*. 

10 

- 6 

36 

19 

- 8 

+ 48 

64 

12 

- 4 

16 

22 

5 

+ 20 

25 

13 

- 3 

9 

24 

- 3 

-f 9 

9 

16 

—. 

— 

27 

— 

— 

— 

17 

+ 1 

1 

29 

+ 2 

+ 2 

4 

20 

+ 4 

16 

33 

+ 6 

f 24 

36 

25 

+ 9 

81 

37 

+10 

^4 90 

100 


+ 1 

159 


+ 2 

4193 

238 


E(X - 16) 

= 0-143 

— c it 

E(Y 

~ 27 ^ --- 0-286 = 



Then 

i.e., 

or 

i.e., 


_ 193 - 7(0-143 x 0-286) _ 

- 159 __ 7(0-14372 

y = 1-213# with origin at (16-143, 27-286) 
Y — 27-286 = 1*213(X — 16T43) 

Y — 1-213X — 7-705 with origin at (0, 0). 


The above equation gives an average value of Y likely to be associated 
with any g^ven value of X. 

The above procedure assumes that the X values are correct, and that 
all the " error " of observation, or whatever it may be, is in the Y values. 



LINES OF “BEST FIT’’. CORRELATION 


165 


This may be true, as it obviously is in Example II above; but often it is 
not, and an equation giving values of X in terms of Y is equally 
important. 

This equation can be obtained simply by changing the roles of X and 
Y in the Normal Equations, which then become: 

EX = na -f- bEY .(i) 

SXY = aEY + bEY 2 .(ii) 

or, using deviations from the respective means, 

Exv 

In the above example E(Y — 27) 2 — 238 

Exy 3 93 — 7(0-143 X 0-286) 

** Xv 2 “ 238 - 7(0-280) 2 ~ 1 

X - X = 0-812 (Y - 27-286) 

X = 0-812Y - 6 013. 


II. The Coefficient of Correlation 


Calculation of the Coefficient of Correlation 

The value of r, the coefficient of correlation between two variables, 
X and Y, is given by the formulae 



The method of calculation of r for two simple series has already 
been sufficiently indicated, Dry, Ex 2 , Ey 2 , or Exy, o z , a y are 
calculated as in the above examples, and r is then calculated from 
one of the above formulae. 

It is, of course, not necessary to obtain the equations to the two 
lines of best fit in order to calculate r. 

The lines of best fit are known as the regression lines, Y on X 
and X on Y, respectively. 



166 INTRODUCTION TO STATISTICAL CALCULATIONS 


The regression coefficients are often stated in the form r~$ and 


Of 


The reader should verify for himself that 


r* 

G r 


hxy , a x 
and r — 
hx* a,, 


£ xy 

W 1 ' 


or is called the “ covariance ” of the variables X and Y. 
n r 

This is sometimes written (x n . 


Calculation of the Coefficient of Correlation from a Correlation 
Table 

Example IV. 


Marks in 
Mathe¬ 
matics (Y) 



Marks in English (X) 



Total 

(Y) 

10-. 

20-. 

30 

40- 

50- 

60- 

70- 

80- 

10- 

1 

5 

8 

— 

1 

— 

1 

i 

16 

20- 

5 

12 

16 

1 

1 

— 

—- 

— 

35 

30- 

— 

10 

20 

12 

4 

1 

«— 

— 

47 

40- 

— 

6 

18 

29 

15 

2 

— 

—- 

70 

50- 

2 

5 

18 

30 

35 

— 

1 

— 

91 

60- 

— 

3 

8 

12 

18 

30 

7 

2 

80 

70~ I 


— 

2 

4 

5 

16 

10 

4 

41 

80- | 

— 

- - j 

— 

— 

2 

4 

6 

j 8 

20 

Total (X). 

8 

41 

90 

88 

1 

81 

53 

25 

14 

400 


Explanation of the Table 

The above correlation table shows the marks obtained by 400 examina¬ 
tion candidates m English (X) and in Mathematics (Y). 

The interval is in each case 10 marks, the lower limits being stated. 
The intervals are consequently 10-19, 20-29 . . . with mid-points 14-5, 
24 5 ... 

The frequencies are given in the body of the table. For example, 
10 candidates had between 20 and 29 marks in English and between 
30 and 39-for Mathematics, 6 had between 20 and 29 marks for English 
and between 40 and 49 for Mathematics, and so on. 

The calculation may be set out compactly as in the table on the next 
page 






Marks in English 


LINES OF “ BEST FIT 


CORRELATION 


167 




168 INTRODUCTION TO STATISTICAL CALCULATIONS 


<j x and G y are calculated in the usual way. 

Note that deviations are in class interval units. 


2f(X - 44-5) = 4- — = 4- 0-306 
400 J ' ' ~ 400 n 


400 
/ = 2-67 
1 


E/(X - 44-5) 2 


1068 


400 

- (0-305) 2 , g x = 1-6053 

— S/(Y - 54-5) = - — 
400 J ' } 400 


- 2-67 


0 2775 


1 1253 

~ S/(Y - 54*5) 2 - ~~ 3-1325 

400 ^ ' 400 

3-1325 - (- 0-2775) 2 , -= 1-748. 


Calculation of the Covariance 

The numbers in brackets m the cells of the above table are the 
products (X — 44-5) (Y — 54-5) expressed m class interval units squared 
and with the appropriate signs. The cells of the table m the column 
for which (X — 44-5) ~ 0, and for the row for which (Y — 54-5) — 0 
have obviously zero for product. Of the four other sections into which 
the table is divided, the top left and bottom right-hand sections have 
all signs positive, while in the other two sections all signs are negative. 

To obtain £(X - 44 5)(Y — 54 5) multiply the frequency m each 
cell by the corresponding number in brackets and find the algebraic 
sum of the products. 

Sum of all the products — \ 744. 

Xry - 744 - 400( | 0-305) (- 0-2775) 

-= 777 855 

Eav _ 777-855 

r w r a, ' 400(1 -6053) (F748) 

= 0-693. 


The Regression Equations 

X = 44-5 + 0-305 X 10 Y = 54-5 - 0-2775 x 10 

= 47-55 = 51-725 

Y — 51-73 — y — (X — 47-55) 

X - 47-55 =- r^(Y - 51-73). 

Gy 


and 



LINES OF BEST FIT". CORRELATION 


Example V. 


Values of Y. 

10- 

15- 

Values of X. 
20- , 25- 

30- 

35- 

Totals. 

15- 

1 

3 

1 


_ 

_ I 

5 

18- 

2 

5 

7 

6 

— 

— 

20 

21- 

2 

5 

8 

12 

3 

— 

30 

24- 

— 

4 

4 

8 

7 

2 

25 

27- 

— 

— 

— 

6 

6 

o 

o 

15 

30- 

— 

— 

— 

*— 

4 

1 j 

5 

Totals 

5 

17 

20 

32 

20 

6 

100 


(i) Verify that the above bivariate distribution has 
/ X = 25*65 and Y = 23-7 

Check the following results, which are expressed in class interval 
units . 

v/(X - X)» - 163*31; E/(Y - Y) 2 -= 154 
S/(X - X)(Y - Y) =-= 98*8. 

(u) Use the above results to calculate the coefficient of correlation and 
the two regression coefficients. 

l>xy 98-8 


Y = • 


-■= 0-623. 


‘ (163-31x154) 

It will be noted that the class interval of X is 5 and that of Y is 3. 


The actual value of r in ordinary units is 

98*8 X 5 X 3 98*8 X 15 


{(163*31 X 25)(154 X 9)}* (163-31 x 154)* X 15 


= 0*623 


so that the fact that the class intervals are different does not affect the 
calculation of r when class interval units are used 

In calculating the regression coefficients care is needed. 




hxy 


To express this in ordinary units it is necessary to 


multiply the expression by 


5X3 

5x5' 


Similarly b 2 


'Lxy 

Yy 2 


must be multiplied by 


5x3 

3x3’ 


and 


b % 


98*8 

- 163-31 X 
_ 98-8 5 

~ 154 X 3 


l =0-605 X | = 0-363 
5 5 

= 0-6416 X | = 1-069. 
o 



170 INTRODUCTION TO STATISTICAL CALCULATIONS 

Regression Equations—Standard Deviations 

Let the regression equations be written y — b x x 

and * = b t y, where b 1 = = A 

U _ S*y _ P 


and 


Sy 2 


Then sum of squares of differences between observed and 
calculated values of x = £(# — b 2 y) 2 

Ex 2 - 26 s S*y + 6 2 2 Ey 2 
S* 2 - b 2 (2Zxy - Z> 2 Ey 2 ) 
ct c 2 — — 6 2 <Jy 2 ) 


i.e., 

i.e., 


wS * 2 
S 2 


and 

Similarly, 

Thus 


= - ^ 2 ) 
Sr = OjVl — r 2 . 


2_ _ »/. £ 2 \ 

<J„ 2 ( Ojr 2 ^ 2 ) 


S„ 


„Vl 




It follows, since the sum of squared deviations cannot be negative 
that 

- 1 < + 1. 

S x and S y are called the “ standard errors ” of the regression lines, 
x ony andy on x, respectively. 


Example VI. Calculate (i) r, (ii) b x and b 2f and (iii) S y 2 from the 
following data: 


X. 


x 2 

Y 

y 

xy 

y 2 

5 

- 1 

1 

8 

0 

— 

— 

7 

1 

1 

9 

i 

1 

1 

3 

- 3 

9 

5 

- 3 

9 

9 

1 

- 5 

25 

4 

— 4 

20 

16 

9 

3 

9 

9 

1 

3 

1 

12 

6 

36 

13 

5 

30 

25 

8 

2 

4 

7 

-1 

- 2 

1 

3 

— 3 

9 

9 

1 

- 3 

1 


0 

94 


0 

63 - 5 

54 

a / « 6*75; 

V- 

11*75; 

P - 

7*25; r 

= 0*814; 

*1 = 


b 2 « 1*074; y = 0*617*. 



LINES OF 4 ‘ BEST FIT 9 \ COREELATION 171 


Obs, y. 

Calc. y. 

d. 

d*. 


0 

—0-617 

+ 0-617 

0-3807 


1 

-j-0-617 

+ 0-383 

0-1467 


-3 

— 1*851 

— 1-149 

1-3202 

a rf 2 *- 18-2128/8 

— 4 

— 3-085 

-0-915 

0-8372 

= 2-276. 

1 

4 1-851 

-0-851 

0-7242 


5 

4-3-702 

+ 1-298 

1-6848 

By formula 

- 1 

-f 1-234 

-2-234 

4-9908 

I 

c* 

II 

in 

l 

-1-851 

+2-851 

8-1282 

« 6*75 {1 - (0 814) 2 } 


= 2-277. 


Diagonal Summation 18 2128 

Example VIL 


\x-io 

Y-7 

-3 

-2 

-1 

0 

+ 1 

+2 

TOTAL 

Y 

7 

8 

9 

10 

11 

12 

-3 4 

-2 5 

-1 6 

0 7 

+1 8 

+2 9 

♦3 10 

X 

"X 

X 

x 

S 

•o 

N 

s 

<s 

N 

\ 

* 

£ 

N 

; 2 : 

X 

X 

4 

14 

21 

23 

23 

9 

6 

TOTAL X 

4 

8 

21 

33 

24 


100 


1 evify the following . 
2/(X - 10) d 


d 

IX-Y) 
2 

I 1 
0 
-1 

~ -2 

~ -3 


t W fd J 


22 

23 


11 
23 
30 - 

25 -25 


10 

1 


-20 

-3 


44 

23 " 

25 

40 

9 


- 100 -3 141 


100 

S/(Y - 7 ) 
100 


- 0-05, S/(X - X) 2 =-- 152-75; a* 2 =- 1-5275; 

a, = 1-23592; 

- 0-02; S/(Y — Y) 2 = 225-96; a/ - 2-2596; 

— 1-50320; 

118-9 


S/(X - X)(Y - Y) - 118-9; r - 


185-783 


= 0-640. 


If the frequencies along the diagonal lines in the table are summed 
they relate to a new variable (X — Y). Set them out in a frequency 
table as shown and calculate the standard deviation. 


Mean difference - 5 - - 0-03 = 

i.e., - 0-05 - (- 0-02) = - 0-03 
2/(d - rf) 2 = 141 - 100(— 0-03) 2 = 140-91 
a,, 2 = 1-4091. 



172 INTRODUCTION TO STATISTICAL CALCULATIONS 

If frequencies are summed along the other diagonals they relate to 
the variable (X + Y). It will be found that 

Mean sum = 6 = - 0-07 = - 0-05 + (- 0*02) 

E/(s - s) 2 = 617 - 100(— 0-07) 2 - 616*51 
<r 5 2 - 6*1651 

It can be shown—sec page 180—that 

Qd 2 = I" °y 2 .(0 

<J S 2 -- <7* 2 + G y 2 -» 2ra x Gy .... (n) 

from (i) 1*4091 - 1*5275 + 2*2596 - 2r(l*23592 x 1 50320) 

whence v ~ 0*640 

which checks the result obtained above 

III. Rank Correlation 

Example VIII. A set of n persons is given two separate tests They 
are then placed in order of merit, i.e., ranked, in respect to each of the 
tests, thus: 




Rank 

Difference of 


Individual. 

Test I 

Test 11 

rank (rf) 

d z 

A 

1 

3 

2 

4 

B 

2 

1 

1 

1 

C 

3 

0 

2 

4 

D 

4 

4 

0 

0 

£ 

5 

2 

3 

9 


18 

The rank correlation coefficient measures the relationship between the 
two sets of ranks. 

If no two or more persons are bracketed equal m either list, what is 
required is the correlation between the numbers 1 to n (set X) and the 
same set of numbers in any other order (set Y) 

The formula for this coefficient is derived from 

r - £*y/(X);r 2 Zy 2 )* 

It has already been show r n that the mean of the first n natural 
numbers is -£(n + 1) and their variance — 1). Their “sum of 

squares ” is therefore n(n 2 — 1)/12. 

Also the sum of the squares of the first n numbers is 

\n{2n + l)(n + 1). 

We have therefore X = Y = - 

z 

£(X - X) 2 - S(Y - Y) 2 = w(w 2 - 1)/12. 



ITS 


LINES OP * ‘ BEST FIT*\ CORRELATION 
the denominator of the above formula, viz. is 

n(n 2 - 1)/12. 

Let d l = difference between A's ranks in the two tests. 


do — 

Then 


but 


B's 

£d 2 - S(X - Y)2 

■=r EX 2 + SY 2 - 2EXY 
2SXY - IX 2 f £Y 2 - Ed 2 , 

EX 2 - EY 2 - n(2n + 1 )(n + l)/(> 
n(2n -f- 1 )(n 4 1) 


etc. 


EXY 


6 


|Ed 2 


but E(X - X)(Y - V) -EXY - wXY 

n(2n + 1 ){n + l)/6 - - ±Zd* 

^ n(2n 4* 1 ){n 4* 1) — f- 1) 2 — d 2 

lT w ( w2 ~ 1) 

( n 2 — n)(n 4“ 1) ~~ 6Ed 2 


- 1 


n(n 2 - 
6Ed 2 


J) 


w(w 2 — J) 
For the above example, therefore, 

6 X IS 


1 

0 * 1 . 


5 \ 24 


If the ranks of n individuals in the second test are exactly the 
same as in the first test then I>d 2 = 0 and the rank correlation 
coefficient is therefore 4- 1, which is a case of perfect direct 
correlation. 

If the ranking in the second test is exactly the reverse of the 

ranking in the first test, then it can be shown that 2, 

n{n l — 1) 

and that consequently the coefficient is — 1, which indicates perfect 
inverse correlation. We therefore have 

If two individuals are placed equal in a test, say at the tenth rank, 
then each is given rank 10| and the next rank is 12. If three 
individuals tie for the fifteenth rank, then each is ranked 16, the 
next rank being 18. This procedure, while often unavoidable, is 
objectionable, since the formula for the rank coefficient of 



174 INTRODUCTION TO STATISTICAL CALCULATIONS 

correlation presupposes that no two or more persons have the same 
rank in the same test. 


Example IX. 

Student: 

A 

B 

c 

D 

E 

F 

G 

H 

i 

J 

Rank m first test: 

3 

5 

4 

10 

8 

8 

1 

6 

8 

2 

Rank m second test: 

6* 

51 

1 

81 

4 

10 

2 

7 

81 

3 

d = difference: 

21 

1 

3 

11 

4 

2 

I 

1 

1 

1 

d 2 : 

61 

1 

9 

21 

16 

4 

1 

1 

1 

1 


r — 

1 - 

6 x 41 

990 

0751. 







It will be noted that three students tie for the seventh place in Test 1, 
and therefore each has rank 8, m Test 2 two students tie for the fifth 
place, thus ranking and 5-J-, and two for the eighth place, thus 
ranking 8^- and 8^. 

6£^ 2 

The formula, y — 1---, m such cases, needs a correction; 

rr — n 

one way in which this may be done is as follows: * 

Let t = number of items—students in this example - involved in a 
tie, whether in the first or second ranking. 

Then Ed 2 , obtained as above, is increased m respect of each tie by 

iV(< 3 - 1). 

Since there is one tie of three students in the first ranking and two 
ties of two students each in the second ranking the formula becomes: 

, = , _ + Vs(3» - 3) + Tfr(2» - 2)} 

990 

IV. Time Series 
Correlation between Time Series 

The coefficient of correlation between the variables X and Y, 
where the data are two series of annual figures, for example, is not 
in general of much use; it may in fact be misleading, or nonsensical. 
This is due to the fact that it is strongly affected by the existence of 
similar, or of opposite, trends in the two series. 

If both series have an upward, or downward, trend during the 
period under consideration, a moderate or high value of + r may 
result merely because of this fact; whereas a moderate or high value 

♦ The problem of " tied ranks'’ is discussed at length [in: Rank 
Correlation Methods, M. G Kendall, M A , Chas. Griffin & Co. 



"LINES OF "BEST FIT CORRELATION 175 

of — f may be due to one of the series having a decided upward and 
the other a decided downward trend. 

It is usual, therefore, to find the coefficient of correlation between 
the deviations of the two series from their respective trends; the 
simplest method is to measure deviations from the moving average 
lines. 

Furthermore, although there actually may be a considerable 
degree of correlation between the variables, this may not be immedi¬ 
ately apparent, because of the existence of a time-lag of one, two 
or more periods, i.e., years, quarters, etc. 

If the data are plotted on a graph it may be observed that the 
two series are similar in shape, but that the hills and valleys in series 
Y regularly appear two periods, e.g., behind the similar features of 
series X. In this case the figures of series Y are advanced two 
periods before the coefficient of correlation is calculated. 

It is unlikely that there will be any such convenient regularity in 
actual data, and it may be necessary to make allowance for various 
amounts of lag, in order to ascertain what amount yields a 
maximum value of r. 

Example X. 


Period : 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

Series X : 

110 

116 

125 

130 

125 

113 

J15 

120 

124 

116 

106 

Series Y : 

89 

101 

96 

89 

94 

102 

105 

99 

87 

93 

92 

Period : 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 


Series X : 

107 

109 

119 

117 

110 

121 

129 

133 

128 

126 


Series Y: 

98 

96 

84 

87 

95 

97 

94 

87 

93 

96 



( 1 ) Calculate the coefficient of correlation between X and Y with the 
data as given. 

X = 119, Y - 94, 2(X - X) 2 = 1298, £(Y - Y)* = 600, 

E(X - X)(Y - 7) = - 262 

r = ~ (1298 xlOOj* = ~ °' 3 °' 

(ii) Calculate the value of r, with the data as given, allowing for a 
2-period time-lag in series Y. 

i.e., Period: 1 2 3 

Series X: 125 130 125 

Series Y: 89 101 96 etc. 

X(X - X) 8 = 1200*42, £(Y - Y) 2 = 594*95; 

E(X - X)(Y - Y) = 335*63 
r = + 0*40. 



176 INTRODUCTION TO STATISTICAL CALCULATIONS 


(iii) Calculate a 5-period moving average for each series, allowing 
for a 2-period time lag in Series Y as above, and set down the deviations 
of each series from its moving average trend. 


X 

3*4 

— 7*6 

-4-4 

2*4 

7-8 

1-4 

-6*4 

— 4-4 

Y : 

2-2 

-7*4 

-3*2 

4*2 

7-6 

1-8 

-8*2 

-0*8 

X . 

-2*6 

6*6 

1*8 

-9-2 

-1*0 

4*8 

5-6 


Y . 

-1*2 

5*4 

4-6 

-8*0 

-4-8 

3*6 

5-0 



(iv) Calculate the coefficient of correlation between the deviations 
of the series from their respective trends. 

S(X - X) 2 - 410-94. £(Y - Y) 2 = 393*08, 

S(X - X)(Y — Y) =- 376*30, 
r = 0*94. 


V. Linear and Non-linear Regression 
The Correlation Ratio 

Calculate the means of each array of y’s and of each array of x’s 
in a correlation table and plot (1) mid-# and mean y for each of the y 
arrays and (2) mid-y and mean x for each of the x arrays. 

If the mean y's he approximately m a straight line and the mean 
x’s do likewise, the regression is said to be linear and the correlation 
coefficient, r, is an adequate measure of the correlation between the 
variables. 

Frequently, however, such a diagram, or a scatter diagram, will 
suggest some sort of curvilinear relationship, and in that case the 
use of r , which is a measure of the extent to which the relation 
approaches a straight-lme “ law ", will be misleading. A low, or 
zero, value of r may be obtained from a bivariate distribution 
where the regression is strong, or, in fact, perfect. The correlation 
table of Question 42, Exercise 6, for example, is a (fictitious) case in 
point; r, calculated from these data, is less than 0*1; nevertheless 
the variables are definitely correlated, and the small value of r gives 
a very false impression. The correlation ratio is the appropriate 
measure when regression is curvilinear. A test for non-linearity of 
regression is given in Chapter X. 

If regression be truly curvilinear, and a straight line of best fit 
be determined, then the spread of the points about this straight line 
must be greater than the spread about the curved regression line ; 
consequently r, the correlation coefficient, must be less than tj, the 
correlation ratio; vj measures the concentration of points about the 



LINES OF ‘ * BEST FIT * *. CORRELATION 


177 


curve of best fit, just as r measures the concentration about the 
straight line of best fit. If the regression is linear, tj = r; if not, 
v) > r. 

The correlation ratio v) v> i.e., the regression of y on x , is the 
ratio of the standard deviation of the weighted means of the arrays 
of ys in the correlation table to the standard deviation of all the y* s 
of the distribution: 

^ly “ a my! a y 

and similarly % = 

7 j cannot be calculated for two simple series. 


Calculation of 

Gmy 




Means of 

Mean of all 


Frequency in 

(Diff ) 3 X 

arrays of y's. 

the y’s. 

Difference 

arrays of y’s 

frequency. 

9i 

V 

(9i - 9) 


«i(?i - y) a 

y 2 

V 

19, - 9) 


*1,(9, - 9) 1 

fs 

V 

(9, - 9) 

*3 

«*(?» - 9)* 





-?)* 


ff 2 

°my 

= - 

■ y) 2 /s«,. 



It can be shown that 1 ^ *o 2 ^ r 2 . If tq 2 ^ r 2 , then the difference 
can be tested to discover whether or not the regression of either 
V on x, or x on y, or both, departs significantly from linearity. A 
description of this test is given in Chapter X. 

The calculation of the correlation ratios is illustrated below. 


Calculation of Correlation Ratios 
Example XI. (Se,e Exercise 6 * No. 37.) 


# = 
y\ 

- 0 

1 

2 

3 

4 

i 

Total 

y 

i 

Jv 

1 

fy 1 - 

\ 

Mean of 
array 
of x 

0 

_ 

_ 

—. 

4 

3 

7 


i 

3*4286 

i 

- 

— 

18 

36 

9 

63 | 

63 i 

63 

2*8571 

2 

— 

12 

54 

36 

3 

105 

210 j 

420 

2-2857 

3 

1 

12 

18 

4 

- - 

35 | 

105 

315 

1*7143 

Total x ' 

1 

24 

90 

80 

15 

210 

378 

798 

— 

Means of 
arrays of 
y’s 

3 

2-5 

2 

1*5 

1 

I 

i 

1 

i | 

I 




178 INTRODUCTION TO STATISTICAL CALCULATIONS 

Standard Deviation of the Weighted Means of the Arrays ofj’s 

Deviations are measured from y — 378/210 = 1-8. 

The weight of each mean is the frequency of its array. 


Mean of 
arrays of y’s 

d. 

(ft - 9) 

d 2 

/ 

fd* 

3 

+ 1-2 

1*44 

1 

1-44 

2-5 

+ 07 

049 

24 

11-76 

2 

+ 0-2 

0-04 

90 

3-60 

1 5 

— 0-3 

0 09 

80 

7-20 

1 

- 08 

0 64 

15 

9-60 




210 

33-60 


9 _33*6_ A | a 

m «~-' 2l0 ~ 016 



V = |fo ~ ( J-8 ) 2==0 ‘ 56 



’•-7w= 0 " 5345 - 


Similarly, 

x — 2-4 
n 2 = 0-64. 


Standard Deviation 

of the Weighted Means of the Arrays of 

Means of 

d 


arrays of x’s 

(*i ~ S) d * / 

fd 2 

3-4286 

1-0286 1-0580 7 

7-4060 

2-8571 

04571 02089 63 

13-1607 

2-2857 

-0 1143 0 0131 105 

1-3755 

1-7143 

-0 6857 0-4702 35 

16 4570 


210 

38-3992 


<W> - 38-3992/210 
= 0-1828 


■r, 2 - 0-1828/0-64 = 0-286 
v-j. .= 0-535. 

The value of r found from the above table is — 0-535. The 
regression is, therefore, definitely linear. 

Alternative Method of Calculating 73 

7 ) y 2 = G my 2 la y 2 which may be written 
— S n x (y % - y) 2 /Na y 2 

where y< = the mean of the #th array of y\ where # = 1,2, 3, etc. 
y = the mean of all the y’s 
m = the frequency of the it h array of y’s 
<j y 2 = the variance of all the y’s 
N == the total frequency. 



179 


LINES OF “BEST FIT’’. CORRELATION 
This formula can be transformed to 

v = - W N '>‘ 

where T 2 - (2y) ;2 

Ti = Sy t , i.e., the sum of the y *s in the tth array. 

In the above example, for the sake of simplicity, the arbitrary 
origin zero was adopted. The work, in general, can be greatly 
reduced by taking arbitrary origins nearer to the respective means 
of x and y. The data of Example XI is used to illustrate how the 
calculations of the correlation coefficient and ratios can be arranged 
compactly. 


Example XII. 



I, — the sum of the first column =*lxl = 1 

T, -= ,, „ second column = (12 x 1) (12 x 0) « 12 

r 8 -= „ „ third column » (18 y 1) f (54 x 0) + (18 x - 1) « 0 

etc. 


* = 2 + ^ = 2 ‘ 4, °** = ST5 “ (0 ' 4)2 * °‘ C4> ^ = 0 ’ 8 

42 126 

y = 2- 210= 18 - V = 2lo - 0-2)2 = 0 ' 56 ’ = 0‘7483 


Covariance: 


2 )(y - 2 ) 


-E(* - x)(y - y) = 


- 84 
{- 84 


210(0-4)( — 0-2)} 


210 


= - 0-32 


0'32 


0-8 X 0-7483 


r =s 


= - 0-535 



180 INTRODUCTION TO STATISTICAL CALCULATIONS 


v - - £)/ n v - («- L 4”-)/> i *("»> - °-“ 6 • 

T)j, -= V0 : 286 = 0 535 

Y) x - Vo 286 -= 0 535 * 


VI. Variance of the Sum or Difference of Two Series 

i.e., of n pairs of measurements of x and y. 

Variance of sum = -£{(* + y) — (x + y)} 2 , where x \ y — mean 
n of sum — " J v 

- - *) + (y- y )) 2 

lz(x - x ) 2 + ^E(y - y ) 2 + ^(x - x){\ - y) 


« r 2 4- V f -S(r- f)(v - y) 


~ <*x l -t «»* + 


2£(a —- x)(v — y) 


na,G 1 


X a f or. 






t V)* - ~ d d 2^<Tj-or^ 

If % and y are uncorrelated then r — 0 

and a (a ^ v) 2 -= a*. 2 f a,, 2 
Similarly it can be shown that 

a <x- y) 2 = <*x 2 + <Jy 2 “ 2rc,<fy 

- aj. 2 + G y 2 , if the variables are uncon elated 


Numerical Example 

x: 1 2 3 4 5 

y 12 5 7 9 12 

(x 4 y) 13 7 10 13 17 

®* 2 2 , (j ?/ 2 - 7-6 , g (x , w a = 11-2 , r X¥ — 0*205 

®(* + W 2 = <*r 2 I V 1 2/'G,a y 

- 2 + 7*6 + 2(0*205)(1*414)(2*757) = 11*2 
and cr ( ,_ w a - 2 + 7*6 - 2(0*205)(1*414)(2-757) = 8*0 
More generally, if z •=- ax 4 fry 
then a* 2 = # 2 a r 2 + fc 2 ^ 2 4- 2 abr xy G x G v 

This formula can be expanded to any number of terms. 



LINES OF "BEST FIT", CORRELATION 


181 


• * 

1 VII. Elementary Curve Fitting 

Method of Least Squares 

(a) Straight Lines 

Let x v % . . . x n> y 2 , y 2 , . . . y n be observed corresponding 
values of the variables x and y, to which it is required to fit a straight 
line. 

Write the equation to the straight line as y ~ - a 0 ~f~ a x x. As has 
been explained, the problem is to find values of a Q and a 1 such that 
Z(y — a 0 — a x x) 2 is a minimum. 

The method is to expand this expression and differentiate it 

(i) with respect to a 0 , a x being constant, and (li) with respect to a v 
a Q being constant, and to equate each result to zero, thus obtaining 
two equations from which the values of a 0 and a x can be determined. 
2(y ~~ a 0 “ a i x ) 2 =•■= ty 2 + na 2 -f a 2 Zx 2 - 2 a 0 Zy — 2a x Zxy 

~)~ 2 &q& x Z% 

(i) 2na 0 — 2 Zy + 2a x Zx -= 0, i.e., na 0 + a x Zx ~ Zy 

(ii) 2 a x Zx 2 2 Dry 4- 2a 0 Zx - 0, i.e , a 0 Zx f a x Zx 2 — Zxy 

(b) Second-degree Parabolas 

The general equation of a second degree parabola may be written 
y = a 0 -|- a x x + a 2 x 2 . 

As before, a Qt a x and a 2 have to be such values that 
Z{y ~ a 0 — a x x — a 2 x 2 ) 2 is a minimum. 

Proceeding as above, the following three normal equations are 
obtained: 

(i) Zy — na 0 4* a x Zx + a 2 Zx 2 t where n -- no. of observations. 

(ii) Zxy = a 0 Zx + a x Zx 2 -f a 2 Zx 3 . 

(iii) Zx 2 y == a 0 Zx 2 + a x Zx* + a 2 Zx\ 


Example XIII. Fit a second-degree parabola to the following data: 



x : 

0 1 

2 

3 

4 



y* 

i 5 

10 

22 

38 


X. 

x t . 

X*. 

X*. 

y- 

*?• 

x*y. 

0 

0 

0 

0 

i 

•— 

— 

1 

1 

1 

1 

5 

5 

5 

2 

4 

8 

16 

10 

20 

40 

3 

9 

27 

81 

22 

66 

198 

4 

16 

64 

256 

38 

152 

608 

10 

30 

100 

354 

76 

243 

851 



182 INTRODUCTION TO STATISTICAL CALCULATIONS 

Substituting these values in the normal equations we get: * * 

5a 0 + 10a, + 30 a 2 =76 .... (i) 

10a o + 30 a x -f 100a 2 = 243 .... (ii) 

30a 0 + lOOfli + 354a 2 = 851 .... (iii) 

whence a Q = 1*42856, a x — 0*24288, a 2 = 2*2142$ 

i.e., y = 1*429 + 0*243# + 2 - 214 * 2 . 

If the given values of x are equidistant, the mid-value of x } if n is 
odd, or mid-way between the two mid-values of x, if n is even, can 
be taken as origin. Odd powers of # then vanish, and the calculation 
is much simplified. 

In the above example take u = x — 2. 

The normal equations then become 

na 0 + a 2 Zu 2 = Zy .(i) 

afZu 2 = Zuy .(ii) 

a 0 Zu 2 f* a 2 Zu* = Zu 2 y .(iii) 


Example XIV. * 

Data of Lxample XIII. 



u. 

u *. 

« 4 . 

y- 

uy 

u*y. 


— 2 

4 

16 

1 

- 2 

4 


- 1 

1 

1 

5 

- 5 

5 


0 

0 

0 

10 

0 

0 


+ 1 

1 

1 

22 

+ 22 

22 


+ 2 

4 

16 

38 

+ 76 

152 


0 

10 

34 

76 

+ 91 

183 



5a 0 + 

■ 10 a 2 

= 76 

10a x = 91 




10a 0 + 

34 a 2 

= 183 



i.e., 

a 0 

= 10*77144, 

a i « 

9*1, Uo = 

2*21428 


giving 

y 

= 10*771 + 

9*lu + 2*2142w 

2 


whence, 

by writing (x — 2) for u we get 




y : 

= 10-77144 + 9-l(* - 2) + I 

2*21428(# - 

2) 2 


or y = 1*42856 + 0*24288# + 2-21428# 2 as before. 

Example XV. Fit a straight line by the method of least squares to 
the following: 

#: 15 20 25 30 

y: - 1*200 + 0*059 + 0*766 + 1*430 

(L,IL B.Sc. Econ. 1945) 



m 



FIT ' 


CORRELATION 


uy. 


— 3 


S 9 

- 1*200 

3-600 

- 1 


- ' 1 

+ 0-059 

_ 

-M 


A 1 

4 0-766 

0-766 

4-3 


9 

-f 1-430 

4*290 

0 


20 

+ 1-055 

+ 8- 

whence for y — a 0 

-f- a x u t 



we get 

y 

— 0-26375 + 0-42985w 


i.e., 

y 

= 0-26375 + 0-42985(§* - 

-9) 


y 

= 0-17194* 

- 3-6049. 


Say 

y 

= 0-172* - 

3-605. 



— 0 059 


Example XVI. Fit a second degree parabola to the following data. 


X: 

1 

2 

3 

4 

5 

Y: 

1090 

1220 

1390 

1625 

1915 

reduce the arithmetic take 

u = (X - 

3) and 

v = (Y - 

u. 

u 2 

w 4 

V 

uv 

U % V 

- 2 

4 

16 

- 72 

144 

- 288 

- 1 

1 

1 

— 46 

46 

- 46 

0 

0 

0 

- 12 

— 

_ 

+ 1 

1 

1 

4-35 

35 

4- 35 

+ 2 

4 

16 

4- 93 

186 

4-372 


10 

34 

— 2 

411 

4 73 


5 a 0 4- 

10a 2 =- — 

2 

10*! = 

411 


10a, 4" 

34a 2 = 

73 



IOoq 4" 

20* s = — 

4 





14tf 2 = 

77 

a 2 

5-5 


«o = 11*4 


1450)/5. 


v 


. Y 
i.e., — 


- 1450 
5 


= - 11-4 + 41*1m + 5-5w 2 
= - 11*4 + 41-1(X - 3} + 5*5(X - 3) 2 


Y = 1393 -F 205-5X - 616-5 + 27-5X 2 - 165X + 247*5 

Y = 1024 + 40-5X + 27-5X 2 


Use of logarithms to determine lines of best fit 

Example XVII. The observed data are as follows: 

x: l 2 3 4 5 6 

y: 2*98 4*26 5-21 6*10 6*80 7*50 

When log y is plotted against log x the points lie approximately on a 
straight line. 



184 INTRODUCTION TO STATISTICAL CALCULATIONS 

Assuming that the equation sought is of the type y =» ftien | 
log y = v log # + log a . 1 

Find a straight line of best fit by using the equations 

S log y = rli log x + n log a .(i) 

2 log x . log y ~ rZ (log x ) 2 + log all log # . . . (u) 


log X. 

(log x)\ 

log y. 

log x log y 

0 

— 

0 4742 

— 

0-3010 

0-090601 

0-6294 

0-189449 

0-4771 

0 227624 

0-7168 

0-341986 

0-6021 

0-362524 

0-7853 

0-472829 

0*6990 

0*488601 

0 8325 

0-581918 

0-7782 

0-605595 

0-8751 

0-681003 

2-8574 

1-774945 

4-3133 

2-267185 


4-3133 = 2-8574r + 6 log a 
2-2072 = l-7749r + 2-8574 log a 
r = 0-5144 
a « 2-978 
y = 2-978*° 5144 
y — 3x* approximately. 

Example XVIII. Fit an equation of the form y — ab r to the follow ing 
data. 

x : 2 3 4 5 6 

y; 144 172-8 207-4 248-8 298-5 



logy 

= log 

a + (log b)x. 


X 

* 2 . 

y 

log y 

X log y 

2 

4 

144 

2-1584 

4-3108 

3 

9 

172-8 

2-2375 

6-7125 

4 

16 

207-4 

2-3168 

9-2672 

5 

25 

248-8 

2-3959 

11-9795 

6 

36 

298-5 

2-4749 

14-8494 

20 

90 

— 

11-5835 

47-1254 


S log y 

= 2*. 

log b + n log 

a 


£* log y 

= £* 2 

. log b + S* . 

log a 


11-5835 = 20 log b -f 5 log a 
47-1254 = 90 log 6 + 20 log a 
b = 1-199 ; a = 100 
i.e., y =- 100(1-2)* 

It should be noted that when the method of least squares is 
applied as above it is the deviations of logy, and not of y, which are 
being minimized. This fact, however, does not, in general, affect 
the utility of the method. 



LINES OF BEST FIT ", CORRELATION 


185 


'* Exercise 8 

J, Obtain the coefficient of correlation and the regression equations from 
the following data . 

x: 2 4 38 6 5 67 5 4 

y: 15 10 16 6 14 12 12 9 11 16 

2. The ranks of the same sixteen students m two tests were as follows : 
the two numbers m brackets denote the ranks of a student in test I and 
test II respectively. (1, 1), (2, 10), (3, 3), (4, 4), (5, 5), (6, 7), (7, 2), (8, 6), 
(9, 8), (10, 11), (11, 15), (12, 9), (13, 14), (14, 12), (15, 16), (16, 13). 

Calculate the rank correlation coefficient 

3. Find the coefficient of correlation between the number of pigs (X) and 
the price of pork (Y). 



No. of pigs m 

Price of pork m 


England and Wales 

England and Wales 


as % of no in 

as % of price in 

Year. 

1900 (X) 

1900 (Y). 

1896 

122 

80 

1897 

98 

100 

1898 

103 

102 

1899 

110 

91 

1900 

100 

100 

1901 

91 

111 

1902 

97 

109 

1903 

113 

100 

1904 

121 

89 

1905 

102 

104 

1906 

97 

111 

1907 

111 

102 

1908 

119 

98 

1909 

101 

111 

1910 

99 

123 


[Ministry of Agriculture and Fisheries] 


4 Calculate the rank correlation coefficient between no of pigs and price 
of pork, from the data of Question 3 

5 Fit a straight line to the following data by the method of least squares. 

x : l 2 3 4 5 6 7 8 9 

y : 10 15 20 27 31 35 30 35 40 

6. Determine the correlation coefficient between x and y and the regression 
equations, 

*.5 7 9 11 13 15 

y : 1-7 2-4 2*8 3-4 3*7 4*4 


7. Marks of 12 students in arithmetic and algebra tests. 

Arithmetic (X) : 60 34 40 50 45 40 22 43 42 66 64 46 

Algebra (Y) : 75 32 33 40 45 33 12 30 34 72 41 67 

Calculate the rank coefficient between X and Y. 



186 INTRODUCTION TO STATISTICAL CALCULATIONS 


8. Compute the coefficient of correlation between the two series of Question 
7. 

9. Two regression lines, y — Zox and x = 0*19y, have been calculated 
from certain data. The numbers are stated correct to two significant figures. 
Make an estimate of the value of the correlation coefficient. 

10. Fit a straight line, by the method of least squares, to the following data . 

#: 0 1 2 3 4 5 6 7 8 

y : 9-8 7-6 6-1 4-2 3 1 -1-5 -3-2 -5*5 -7*4 

11. Calculate the correlation coefficient between X and Y 

X: 9-2 10*6 10-3 13-8 17 5 20 1 16-9 14*6 17-0 10-0 

Y: 10*5 9-2 9-7 11 5 12*5 13*0 14*3 15 0 13*9 17*4 


12 Calculate the regression equations from the data of Question 11 
13. Find a linear law giving y (retail food price index) m terms of * (whole¬ 
sale food price index) from the following table : 


1928. 

1929. 

1930. 

1931. 

1932. 

1933. 

1934. 

1935 

1930 

1937 

x : 89 

86 

74 

65 

65 

63 

66 

67 

72 

79 

y . 92 

91$ 

84 

75 

73£ 

72 

70J 

(L.U. 

75 

B Sc 

77J 
Econ , 

84 

1938) 


14. Data of Question 13. Calculate the coefficient of correlation between 
# and y. Calculate also the standard error of the regression of v on a 


15 Working-class Households in Selected Boroughs 

Below minimum 



standard of 

Overcrowded 

Borough. 

living ( x ). 

(y) 


No. per 200 households 

A 

17 

36 

B 

13 

46 

C 

15 

35 

D 

16 

24 

E 

6 

12 

F 

11 

18 

G 

14 

27 

H 

9 

22 

1 

7 

2 

j 

12 

8 


Compute the product sum correlation coefficient between poverty and over¬ 
crowding. Also measure the relationship by the formula 1 — 6£d 2 /N(N 2 —-1), 
where N is the number of boroughs and d is the difference between the ranks of 
any -borough ordered by poverty and overcrowding 

(L.U. B.Sc Econ , 1932) 

16 Fit a straight line, by the method of least squares, to the following 
data : 

x: —12 —8 —4 0 4 8 12 

y: —10*75 -4*32 -2*51 -0*83 +3*61 +2*50 +10*00 



LINES OF " BEST FIT ”, CORRELATION 


187 


17. Coal Consumed and Pig Iron Produced 



Coal consumed 

Pig iron 


in blast furnaces 

produced 

Year. 

(mn. tons) 

(mn tons) 


(X) 

(Y). 

1929 

14 51 

7*59 

1930 

11*69 

6 19 

1931 

7*11 

3*77 

1932 

6*53 

3*57 

1933 

7*37 

4*14 

1934 

10*47 

5*97 

1935 

10 79 

6*42 

1936 

12*84 

7*72 

1937 

14-76 

8*49 

1938 

11*56 

6*76 


From these figures obtain a formula giving pig-iron production in terms of 
coal consumption. Would you expect this formula to apply m 1947 ? 

(LUB Sc Econ , 1948) 


18. Number of Pigs and Price of Bacon Pigs 


Year • 

No. ol 

1939 

pigs 

1940 

1941 

1942 

1943 

1944 

1945 

1946 

1947 

1948 

(X) : 

100 

93 

58 

49 

42 

42 

49 

44 

37 

49 

Price (Y) 

100 

144 

153 

176 

178 

177 

184 

203 

240 

264 


[Agricultural Statistics of the U K j 


Obtain the correlation coefficient, r^, and the regression equation, Y on X 

19 Depth of Water Applied m Feet (*) and Yield of Alfalfa in 
Tons per Acre (y) 

x : FO 1*5 2*0 2*5 3*0 3*5 4*0 

y: 5*3 5*7 6*3 7*2 8*2 8*7 8 4 

Find, by the method of least squares, the equations of the lines of regression 
of y on x, and # and y. State the value of r, the coefficient of correlation 


20 Gross and Net Reproduction Rates for France 



1841-45 

1846-50. 

1851-55 

1856-60 

1861-65 

1866-70 

1871- 75. 

Gross : 

177 

169 

165 

168 

171 

171 

168 

Net * 

104 

99 

95 

98 

103 

102 

100 


1876-80 

1881-85 

1886-90 

1891- 95 

1896-1900 

1901-05 

1906-10 

Gross : 

169 

165 

153 

154 

141 

137 

127 

Net: 

108 

105 

99 

97 

98 

98 

95 


Compute the correlation coefficient between these two measures of fertility 
for the period 1841-1910. Explain the meaning of this correlation coefficient. 

(L U. B.Sc Econ., 1946) 

21 

Live weight of pig (lb.), X : 125 155 190 203 217 

Weight of a side of bacon (lb.), Y : 31 46 51 65 72 



188 INTRODUCTION TO STATISTICAL CALCULATIONS 

Calculate equations of lines of best fit for (a) Y m terms of X and ( b ) X in 
terms of Y. 

22 . 

Depth of water m feet, X *. 1-8 1-9 2*5 1-4 1-3 2-1 2*3 

Yleld m lbs per plot, Y . 260 370 450 160 90 440 380 

Calculate the regression equation of Y on X 

23 In ten areas the infant mortality (Y) and birth rates (X) are found to 
be 

X. 22-9 17-8 20-8 21-3 20 7 20-9 17*5 13-6 23-3 18*1 

Y: 44 46 56 42 32 47 38 45 41 52 

Calculate the regression equation of Y on X and discuss its meaning 

(LU.B Sc Econ , 1949) 

24. Marks of 30 Candidates m Algebra (X) and Geometry (Y) Tests 


X : 

88 

50 

36 

45 

41 

32 

43 

42 

66 

64 

46 

39 

34 

39 

20 

Y . 

75 

40 

24 

45 

33 

22 

31 

35 

72 

41 

55 

32 

28 

26 

20 

X 

47 

76 

75 

73 

48 

27 

52 

53 

56 

57 

62 

67 

73 

82 

54 

Y . 

36 

81 

56 

71 

42 

47 

46 

52 

62 

50 

51 

56 

67 

53 

46 


(i) Calculate r and b x from the actual scores 

(n) Make a correlation tabic with intervals 20-29, 30-39, etc , for each 
variable 

(m) Calculate r and the regression equations from the table 
(iv) Draw a scatter diagram of the original data and insert the regression 
equations thereon 

25. 






Marks for English (X) 
















<D 

to 


i 41- 

46- 

51- 

56- 

61- 

66- 

71- 

76- 

81- 

Totals 

V 

% 


i 45 

50 

55 

60 

65 

70 

i 75 

80 

85 

o 

a 

W 

41- 

_ 

2 

7 

10 

7 

4 

_ 

_ 

_I 

30 

46- 

1 

8 

16 

31 

17 

24 

11 

3 

1 ~ 

111 

'rt 

u 

51- 

- 

— 

7 

15 

12 

9 

3 

— 

— 

46 

<D 

a 

56- 

— : 

- 

4 

6 

10 

8 1 

2 

2 

1 

33 

c5 

61- | 

.— 

1 

1 

1 

3 

1 

4 

— 

— 

11 

(H 

66- 

— 

.— 

2 

— 

1 

2 

— 

— 

— 

5 

£ 

71- 

— 

— 

1 

1 

1 

2 

2 

1 

— 

8 

10 

X 

<3‘ 

76-80 


— 

— 

1 

1 

— 

2 

1 

1 

6 












S 

Totals : 

1 ] 

11 

38 

65 

52 

i_ ! 

50 

24 

7 

2 

250 


Find the correlation coefficient between marks for English and marks for 
General Knowledge from the above data. 





LINES OF " BE-ST FIT ” 


CORRELATION 


18 ® 


/' 

7 

10 “ 

20 - 

30 - 

40 - 

50 - 

Totals. 

20 - 

4 

6 

10 

__ 

— 

20 

30 “ 

2 

5 

9 

4 

— 

20 

40 - 

— 

6 

15 

10 

4 

35 

50 - 

— 

| 1 

7 

12 

3 

23 

60 - 

— 


5 

8 

2 

15 

Totals- 

6 

18 

46 

1 34 

9 

113 


( 1 ) Calculate and the regression equations 
(n) Calculate the standard errors of the regression equations 
(in) Calculate the mean value of Y for each column—eg 1st col. Y x = 
{(25 X 4) + (35 y 2)}/6 ~ 28*3—and the mean value of X for each row. 
Plot the points (mid X, mean Y), i e , 15, 28-3, etc , and mid Y, mean X; 
insert the regression lines m your diagram 

27 Compute the correlation coefficient r from the following data . 


Variate 

■ 

y- 

0- 

5 

10 

Variate x 

15- 20 25- 

- 30- 

35- 

40- | 

i 

Totals. 

0- 

1 

1 

1 




_ 



3 

5- 

- 

2 

3 

»> 

- 

— 

— 



i 7 

10 

- 

1 

3 

4 

2 

— 



— 

1 10 

15 



•» 

5 

5 

2 

1 

- 


15 

20 



- 

4 

6 

6 

4 

3 

__ 

23 

25 1 




1 

7 

5 

3 

2 

— 

18 

30 i 





2 

4 

4 

2 

— 

12 

35- 






2 

4 

1 

1 

8 

40- 

1 

| 






2 

1 

1 

4 

Totals 

___i 

1 

4 

9 

16 

22 

19 

18 

9 

2 I 

100 








TliT 

BSc* 

Econ 

, 1937) 


28 Data of Question 27 compute the regression equations and their 

standard errors Compute Vb l b 2 

29 Data of Question 27 . compute <J [x -y) 2 by diagonal summation 


30 Performance of 132 Students m Test X and Test Y 


* 


i 



X 

Y ' 

30 

40- 

50 

CO- 

70- 

Total 

20- 


5 

3 

_ 

— 

10 

30- 

1 

8 

12 

6 

— 

| 27 

40 - 

. 

5 

22 

14 

1 

42 

50- 

_ 

2 

16 

9 

2 

29 

60“ 

— 

1 

8 

6 

1 

16 

70- 

— 

— 

2 

4 

2 

8 

Total: 

3 

21 

63 

39 

6 

132 




\ 




190 INTRODUCTION TO STATISTICAL CALCULATIONS 

Calculate ( 1 ) the correlation coefficient, 

(n) the regression equations, 

(lii) the standard errors of the regression equations. 

31 Use the method of diagonal summation to calculate (i) a (x ^ y) 2 and 
( 11 ) a ( * 4i , 3 2 from the table of Question 30, and use these results to check the 
value of r. 

32, Data of Question 30. Calculate (i) yj y and (n) 17 *. 

33 Data: Additional Exercises, p. 119 . No 7 

Calculate the coefficient of correlation between size of family and number 
of rooms occupied. 

34 Data : Additional Exercises : No 34 

Calculate the coefficient of correlation between the number of earners and 
number of non-earners in the family. 

35. Data : Additional Exercises : No 36. 

Calculate the coefficient of correlation between family income and amount 
expended on fuel. Check your answer by calculating a {x _ y) z or cr (r , y) * 

36. A factory welfare service collects information concerning, amongst 
other things, weight, height, education and type of work of the employees. 
This information is summarized on an index card for each employee The 
following table contains information from thirty of the cards selected at 
random, each row of the table being obtained from one card The letter S 
denotes the person was on the staff, the letter I that he was an industrial 
worker. The letter E denotes that a person dul attend a Secondary School, 
the letter N that he did not. 

Prepare a scatter diagram to show the connection between height and 
weight in the sample and calculate the correlation coefficient. Carry out a 
test to see whether there is any evidence of association between education 
and type of work 

(RSS Certificate, 1947) 

(For second part of question—see Chapter X * test.) 


Weight 

(to 

nearest 

Height 

(to 

nearest 

Type 

of 

Educa¬ 

Weight 

(to 

nearest 

Height 

(to 

neatest 

Type 

of 

Educa¬ 

5 lb.). 

inch). 

work. 

tion. 

5 lb.) 

inch). 

work 

tion. 

145 

67 

S 

N 

145 

67 

S 

E 

155 

71 

S 

E 

145 

71 

T 

E 

160 

70 

S 

E 

165 

66 

S 

E 

130 

68 

S 

E 

135 

70 

I 

E 

145 

69 

I 

N 

120 

67 

I 

N 

150 

69 

S 

E 

135 

69 

I 

N 

125 

64 

I 

E 

135 

64 

I 

E 

. 165 

71 

S 

N 

140 

71 

S 

E 

130 

71 

S 

E 

160 

74 

I 

E 

145 

70 

I 

N 

120 

70 

s 

N 

135 

68 

I 

N 

145 

69 

I 

N 

170 

75 

s 

E 

180 

70 

I 

N 

140 

69 

I 

N 

160 

73 

I 

N 

150 

68 

s 

N 

135 

66 

I 

N 

115 

66 

I 

N 

130 

72 

I 

N 



LINES OF “BEST FIT 1 ’. CORRELATION 


191 , 

37 . Prove that if the regressions of y on x and x on y are both lineat, then 
the product of the two regression coefficients is equal to the square ^of the 
product-moment correlation coefficient In the table below verify that the 
means of the x array are collinear, and also those of the y arrays and deduce 
that the correlation coefficient is —0*535. 


" s ^ *• 0 1 2 3 4 

y- ^_ 

o — — 

i _ 

2—12 
3 1 12 

I 


f (K S.S Certificate, 1947) 

38 The following table shows the distribution of scores in two tests A and 
B Calculate * 

( 1 ) the correlation coefficient, 

(n) the regression equations; 

(ni) the variance of (A — B)—by diagonal summation, 

(iv) the correlation ratios. 


— 43 

18 36 9 

54 36 3 

18 4 


\A. 
B. \ 

10 - 

15- 

20 - 

25- 

30- 

35- 

40- 

45- 

/■• 

10 - 

1 

_ 

2 

— 

— 

_ 

_ 

_ 

3 

15- | 

2 

3 

2 

2 

2 

1 

--- 

— 

12 

20 - 

1 

5 

8 

4 

5 

6 

— 

1 

30 

25- 

— 

2 

6 

10 

12 

15 

2 

2 

49 

30- 

— 

1 

3 

7 

8 

9 

4 

3 

35 

35- 

— 

— 

— 

1 

4 

3 

6 

4 

18 

40- 


— 

— 

— 

— 

1 

4 

3 

8 

45- 

— 

-- 

-- 

— 

— 


2 

3 

5 

A 

4 

11 

21 

24 

31 

35 

18 

16 

160 


39. Data : bivariate table of Example VII of this chapter. 
Compute the regression equations and their standard errors. 
40 Data : as for Question 33. 

Compute the correlation ratios 


41. 

7. 

8 . 

Values of x . 

9. 10. 

11 

12 . 

Total. 

No. of cases : 

4 

8 

21 

33 

24 

10 

100 

Average y ; 

5 

5*25 

6 

7 

7*5 

8*4 

— 


From the above data calculate the correlation ratio of y on x t given that 
< 7 / «= 2-26. 



192 INTRODUCTION TO STATISTICAL CALCULATIONS 


42. 


X. 

12- 

13- 

14- 

15- 

16- 

17- 

18- 

19- 

Totals 

y- - 









i 

60- 

59- 


2 

2 

1 

4 

4 

3 

2 

— 

1 

17 

58- , 


— 

4 

9 

6 

6 

3 

— 

28 

57- 

1 — 

3 

6 

10 

7 

7 

5 

— 

38 

56 ~ 

1 

3 

4 

3 

6 

6 

7 

— 

30 

55 

— 

3 

— 

- 

— 

— 

5 

2 

10 

54- 

2 

2 

— 

- 

2 

2 

3 

3 

14 

53- 

6 

— 

— 


- 

— 

— 

2 

8 

52- 

3 

- 


— 

— 


— 

1 

4 

Totals 

12 

13 

16 

27 

25 

24 

25 

8 

150 


Compute r and rf u and 

43. Fit a second-degree parabola to the following data • 

x : 0 l 2 3 4 5 

y : 5 13 25 60 105 200 

44. Fit a quadratic parabola to the following series of observations, taking 
the year as the independent variable 


Year: 

1924 

1927 

1930 

1933 

1936 

1939 

1942 

Index of Coal 
Expert Prices 

187 

142 

133 

129 

136 

169 

279 


Use your results to estimate the value of the index for 1035 

[R S.S Certificate Specimen paper] 

45 Compute the parabolic regression of y on x . Compute also the straight 
line of best fit. Plot the original points and both regression lines on one 
diagram 

Year (X) : 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 

Production, 

tons (Y) : 1560 1720 1640 1870 1790 2140 2060 2550 2700 2610 3000 

46. Result of Fertilizer Experiment on Crop Results 

Units of fertilizer used 0 2 4 6 8 10 

Units of yield ; 110 113 118 119 120 118 

Fit a parabola by the method of least squares to the above data, and estimate 
for what fertilizer application the best results are obtained If allowance is 
made for fertilizer cost on the basis that the cost of 5 units of fertilizer equals 
the value of one unit of yield, find the most profitable amount of fertilizer to 
use. 

(L.U. B Sc. Econ., 1932) 

47. Fit a second degree parabola to the data, i.e„ compute the parabolic 
regression of y on 

*: 1 2 3 4 5 

y: 1090 1220 1390 1625 1915 



LINES OF “BEST FIT*’. CORRELATION 


193 


48. Use the method of least squares to obtain the constants of the equation 
y = a -f bx -f which describes these data : , 0$ 

Values of x : -4 -3 -2 -1 0 1 2 3 4 

Values of y : 101 104 107 107 108 110 109 110 108 

(Inst T. Assoc ) 

49 Estimate the constants of the Pareto Curve, n = Ax~°, which fits the 
data below 

1945-46 Number of Net Incomes more than £x after Tax 
Income (£x). No. (n). 

150 14,000,000 

500 825,000 

1000 173,000 

2000 35,500 

(L.U. B.Sc. Econ., 1949) 

50 Determine the constants of the curve y — ax n which best fits the data 

given . 

x: 2 3 4 5 

y : 2 4-5 8 12-5 

51 *: 0 2 4 6 8 10 12 14 

y: 114-5 130*1 150*1 171*0 196*7 225*2 257-9 295*2 

Determine the constants of the curve y — ar* which best fits the above data, 

52 Given that the normal equations for a cubic parabola, when x is centred, 
are 

na 0 -f a 2 Su 2 — 2y 
a^u 2 -f a 3 Sw 4 — £wy 
a 0 Zw* *f a 2 E« 4 = Eu 2 y 
+ a 3 Xw 6 = 2 u z y 

find the equation of the cubic parabola which fits the following data . 

u = (x - 2) 

*: 0 1 2 3 4 

y : 4 4*25 6 10-75 20 


i 


a 



CHAPTER VII 


CALCULATION OF MOMENTS 

I. Power Moments 

Definition and Symbolism 

Let x v x 2 , . . . be values of the variable. 

Then the 1st moment about zero = --=#. 


S 2 in previous pages 

mean square deviation 
about 0. 


The 1st moment about any arbitrary origin , A, 

— “ A). - I A _ A? l 


= c m previous pages. 

The 2nd moment about any arbitrary origin, A, 

= S (* - A ) 2 
n 

= S 2 , mean square deviation about A. 

The 3rd moment about any arbitrary origin , A,' 

£(*--A) 3 , 

= —--- and so on. 

n 

Let Vj = 1st moment about A 

v 2 = 2nd ,, A 

v 3 — 3rd ,, A, etc. 

When the values of the variable have different frequencies 
Vj = , where N = £/. 

= gfe - A)- 
v 2 > CLL '* 


194 



CALCULATION OF MOMENTS 


195 ' 


Second, third and higher moments are usually referred to the mean 

of the distribution as origin 

-rt. £(* — x) 2 hf(x — x) 2 2 

Thus |x 2 = —- or —- = 0 


n 




B; etc. 

n 


The rth moment about the mean = \i r = Hf(x — x) r /I*f. 


Note. 




- w* - _ a 

- £/ - U 


_S/(* - x)0 _S/ __ 
l o ~ s/ ~-£f~ ■ 

Similarly, v 0 = 1. 

Moments are, in the first instance, usually calculated about an 
arbitrary origin, A. It is necessary, therefore, to be able to trans¬ 
form v 2 , v 3 , v 4 into \i 2 , [x 3 , which can be done by using the 
following relationships: 

(i) pt -2 — v 2 — v i 2 * This is already familiar to the reader as 
a 2 rs5 S 2 _ A 

* (ii)‘f*3 = v 3 ~ $ v l v 2 + ^ V 1 3 - 

(hi) - v 4 — 4 v x v3 + 6vj 2 v 2 — 3v a 4 . 

These formulae can be obtained as follows: 


ii r ~ - z{x — %y. 


by binomial expansion 


ti, = l(Lx’ - rxZx '- 1 + r C. 2 x 2 Zx r - 2 - r C 3 x s X %'~ 3 + r C 4 ^S^-* 

- r CsX 5 Zxr *+...) 

Dividing by n , and remembering that x = v 4 , and that v 0 = 1, 
we get 

. r{r — 1) 2 r(r — l)(r — 2) * * 

-= v r - + ~ 2 ] V lV 2 - 3j - V lV 3 

r(r-l)...(r-3) ,(r - !)•. . . (r - 4) - 

j- jj v l v r _ 4 g-J— V 1 v r -5 T- • • • • 

— v 2 — ^Vi + V, tks other terms vanishing. 


M- 3 


^4 


2S "1 

v 3 ~ 3v,v, + 3v 1 2 v 1 — V, 3 , the other terms vanishing. 
v 3 — 3v l v 2 -j- 2v x 3 
V 4 — 4v x Vg -f 6v x 2 Vg — 4v 1 3 Vj + v x 4 
V 4 — 4v 1 v s + Ovj^vg — 3vj 4 



196 INTRODUCTION TO STATISTICAL CALCULATIONS 


Computation of Moments 


Example J. From mean direct. 


X, 

/• 

(x - 4). 

/(* - 4). 

1 

f(x - 4)». 

f(x - 4)*. 

0 

1 

- 4 

- 4 

16 

- 64 

256 

i i 

8 

- 3 

- 24 

72 

-216 

648 

2 ! 

28 

-2 

- 56 

112 

- 224 

448 

3 

56 

- 1 

- 56 

56 

- 56 

56 

4 

70 

— 

-140 j 

— 

- 560 

— 

5 

56 

4 1 

4 56 

56 

4 56 

56 

6 

28 

+ 2 

+ 56 

112 

4- 224 

448 

7 

8 

+ 3 

+ 24 

72 

4- 216 

648 

8 

1 

+ 4 

4- 4 

16 

4- 64 

256 


256 


+ 140 

512 

4- 560 

2816 


[Xl = 

140 - 140 

« 0 

512 

256 

^" 256 “ 


560 - 560 

=* 0 

2816 

F'S = 

256 

^ ~ 256 = 


Note that the odd moments, about the mean, vanish, as they obviously 
must, for a symmetrical distribution. 

Example XL TJbing an arbitrary origin : class interv al — unity. 


Mid-point. 

w 

(/>• 

i 

(* - 15). 

fix - 15). 

f(x - 15) 2 . 

fix - 15)’. 

10 

9 

- 5 

- 45 

225 

- 1125 

11 

36 

- 4 

- 144 

576 

- 2304 

12 

75 

- 3 

- 225 

675 

- 2025 

13 

105 

- 2 

- 210 

420 

- 840 

14 

116 

- 1 

- 116 

116 

- 116 

15 

107 

— 

- 740 

— 

-6410 

16 

88 

4- 1 

4- 88 

88 

+ 88 

17 

66 

4- 2 

4- 132 

264 

+ 528 

18 

1 45 

4 3 

4 135 

405 

+ 1215 

19 

30 

4*4 

4- 120 

480 

+ 1920 

20 

18 

+ 5 

j + 90 

450 

+ 2250 

21* 

5 

4-6 

4 30 

180 

+ 1080 


700 


4- 595 

3879 

4- 7081 


Vi 


145 

700 


0-2071 


v 2 


3879 

700 


5-5414 


— + 


671 

700 


+ 0-9586 



CALCULATION OF MOMENTS 


197 


^ « v 2 - v* 8 - 5-5414 - (0-2071) 2 = 5-49S5 
jjl 3 ^ v 3 — 3vjy 2 + 2v x 3 

= + 0*9586 - 3(- 0*2071)(5-5414) + 2(~ 0-2071) 3 
= + 0*9586 + 3-4429 - 0*0178 
= + 4-3837. 

Example III. Arbitrary origin : class interval not unity. 

The moments should always be calculated in class interval units. 


X. 


Unit ~ 5. 
(# — 32 5). 

/(*-32-5). 

/(*-325)* 

f(x- 32-5) a 

/(*— 32-5)*. 

5- 

3 

- 5 

— 15 

75 

- 375 

1,875 

10 - 

15 

4 

- 60 

240 

- 960 

3,840 

15- 

45 

- 3 

- 135 

405 

-1215 

3,645 

20 - 

102 

- 2 

- 204 

408 

- 816 

1,632 

25- 

140 

- 1 

- 140 

140 

- 140 

140 

30- 

165 


- 554 

— 

-3506 

— 

35- 

120 

+ 1 ; 

+ 120 

120 

-+ 120 

120 

40- 

67 

1 + 2 

+ 134 

268 

+ 536 

1,072 

45- 

i 25 

+ 3 

+ 75 

225 

+ 675 

2,025 

50- 

! 10 

h 4 

+ 40 

160 

4- 640 

2,560 

55-60 

2 

4 5 

+ 10 

50 

+ 250 

1,250 


694 


4- 379 

2091 

+ 2221 

18,159 


v L = - 0-25216, v 2 = 3-01297, v 3 - - 1-85157, v 4 = 26-16571 
fx 2 = 3-01297 - (- 0-25216) 2 = 2-9494 
|x 3 = - 1-85157 - 3(—0-25216) (3-01297) + 2(- 0-25216) 3 
= + 0-3956 
(jt 4 = 25-435. 

Two important constants of a distribution are calculated from 
[- t 2 » and They are 

(i) p x = }^ 3 2 /f x 2 3 

(») 

For th$ distribution of Example I, fa = 0/8 = 0; fa “ 11/4 = 
2-75. When^a distribution is symmetrical, moments of odd order 
vanish. i.e., (Jt 3 /^ 2 3/2 , is used as a measure of skewness; it is 

obviously zero for a symmetrical distribution. The meaning to be 
attached to values of fa will be referred to in Chapter IX. f 

Sheppard’s Corrections to Moments for Errors Due to Grouping 

When calculating the moments of a distribution it is assumed that 
the members of any class have as their mean the mid-point of the 




198 INTRODUCTION TO STATISTICAL CALCULATIONS 

class interval. Actually the mean of the members is nearer to the 
distribution mean than the mid-point, as can easily be seen if a 
distribution curve is mentally substituted for the histogram. 
Hence each deviation, |%i — x\, is in excess of the true deviation. 
Sheppard’s corrections are applied in the case of symmetrical or 
moderately skew distributions * to compensate for this. The mean, 
v x , and third moment, need no correction, since in their cases the 
errors are compensating. 

Corrected 2nd moment - p 2 — /i 2 /12 

7 L4 

„ 4th „ ~ i* 4 — + 2Jb> 

where h is the width of the class-interval. 

Example IV. Calculate the moments about the mean of the distribu¬ 
tion. Calculate also and (3 2 . * values, in cms., are the mid-points of 

intervals. 

x 2 0 2-5 3 0 3-5 4 0 4 5 5 0 

/: 5 38 65 92 70 40 10 


X. 

/• 

Unit— 0-5. 
(*-3-5) 

/(*-««) 

/(*-3-0) s - 

/(* ~3-5) 3 

f(x 

2-0 

5 

- 3 

- 15 

45 

- 135 

405 

2-5 

38 

- 2 

- 76 

152 

- 304 

608 

3*0 

65 

- 1 

- 65 

65 

- 65 

65 

3-5 

92 

— 

— 

— 

___ 

— 

4-0 

70 

+ 1 

+ 70 

70 

4- 70 

70 

4-5 

40 

+ 2 

+ 80 

160 

4- 320 

640 

5-0 

10 

-f 3 

+ 30 

90 

4-270 

810 


320 


1 +24 

582 

156 

2598 


v x = 0-075, v 2 = 1-81875, v 3 =r 0-4875, v 4 - 8-11875 
(J4 « 1-81875 - (0-075) 2 = 1-813125 
p* = 0-4875 - 3(0-075)(1-81875) + 2(G-075) 3 
« + 0-0791 
jjt, 4 =*= 8*033 

Pi = (0-0791) 2 /(l-8131) 3 =c= 0-C01 
p 2 = 8-033/(1-8131)* =c= 2-44. 

* Sheppard's corrections must not be applied indiscriminately. See An 
Introduction to the Theory of Statistics (Yule 6c Kendall, Griffin & Co., 
1945), page 141. 




CALCULATION OF MOMENTS 


199 


In ordinary units 

ft = 1*8131 X (|) 2 
ft - 0*0791 X (l) 3 
ix 4 - 8*033 X ft)* 

pj and (3 2 are > °f course, unaffected by the change of unit. 

Example V. The first four moments of a distribution about x = 4 
are 1, 4, 10, 45. Show that the mean is 5 and calculate the moments 
(i) about x = 0, (ii) about # = 5. 

A = Arbitrary origin = 4 

V, = 1 = I X - A I 
mean -- 4 -f 1 5. 

Moments about the Mean 

(X 2 = v 2 - V! S = 4 - 1 = 3 

Pi = 10 - 3(1)(4) + 2(1*) = 0 

p 4 = 45 - 4(1)(10) + 6(1 2 )(4) - 3(1 4 ) = 26. 

Moments about Zero 

v 2 — (X 2 + V X 2 = 3 + 25 = 28 
v 3 ~ U 2 4- 3v,v» — 2v, 3 
= 0 + 3(5)(28) - 2(5) 3 

- 420 - 250 = 170 

V 4 ■= ft + 4 v 1 V 8 “ 6v l 2 V 2 + 3v t 4 

- 26 + 4(5)(170) - 6(25)(28) + 3(625) 

1101. 


II. Factorial Moments 

The moments, v r =-= ^ Ef(x — A) r about any arbitrary origin A, 

and [i r — — x) r about the mean, are known as “ power ”, 

moments. 

“Factorial” moments about zero are* defined by v (f) = ^£/r (r) , 

where x ir) = x(x — l)(x — 2) . . . (x — r + 1). 

Since factorial moments may be calculated by simple summation 
methods, their computation may be easier in some cases than that 
of power moments; but since they have in elementary work to be 
transformed into their power equivalents, their advantage in such 
work is limited. The following examples are included for purposes 
of illustration. 



200 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example VI. Calculate the first four factorial moments of the 
following distribution by the direct method: 


(Mid-points) x : 5 6 7 8 9 

/. 1 2 3 2 2 


A 

/• 

A- 

i 

M*~ i) 

fx(x- l)(*-2). 

/*<*>. 

/*(*-l)(*-2)(*-3). 

5 

1 

3 

20 

60 

120 

o 

2 

12 

60 

240 

720 

7 

3 

21 

] 26 

630 

2,520 

8 

2 

16 

112 

672 

3,360 

9 

« 

2 

18 

144 

1,008 

6,048 


10 

72 

462 

2,610 

12,768 


v d) — Is — 7‘2 — x 

v (2) = 46-2, v (3) = 261, v (4l = 1276-8 


Transformation of Factorial into Power Moments 

v m = Jj £ /* (1) = % X = v i = * 

v (2) ™ I S/*(* - 1) = J (S/% 2 - S/%) 

v (3 , = ^ S/%(% - 1 )(% - 2 )=^ (S /% 3 - 3 S /% 2 + 2 S/%) 

= v 3~ 3v 2 -f 2vj 

v (4) = v 4 ~ ® V S + llv 2 ~ 6Vj. 

From the above we obtain 

*2 = = v (2) + % 

V 3 = v < 3 ) + 3{v ( 2) + _v,} — 2v x 
— v (3> + 3v (2) + X 

v 4 — v <4> + 6(v (3) + 3v < 2) +_ v (ll) ~ il( v (2) +■ v (l>) + 6% 

= v (4) + 6v, 3 ) 7v ( 2) -f- X ^ 

v x = 7-2, v 2 = 46-2 + 7-2 = 53-4 

V 3 = 261-0 + 138-6 + 7-2 = 406-8 

v 4 = 1276-8 + 1566 f 323-4 + 7-2 = 3173-4. 

These can then be transferred to the mean, giving 

P 2 = 1-56 
(t, = - 0-144 
[4* = 4-939. 



CALCULATION OF MOMENTS 


201 


Factorial Moments by Summation 

Example VII. Data from Example II of this chapter. 


M.p 

# 

/ 

£° 

E 1 

E 2 

i 

t 

£ 4 . 

10 

9 

700 

_ 

_ 

_ 

— 

11 

36 

691 

3,355 

— 

! 

— 

12 

75 

655 

2,664 

8,287 

— 

— 

13 

105 

580 

2,009 

5,623 

13,734 

— 

14 

116 

475 

1,429 

3,614 

8,111 

16,641 

15 

107 

359 

954 

2,185 

4,497 

8,530 

16 

88 

252 

595 

1,231 

2,312 

4,033 

17 

66 

164 

343 

636 

1,081 

1,721 

18 

45 

98 

179 

293 

445 

640 

19 

! 30 

| 53 

81 

114 

152 

195 

20 

18 

23 

28 

33 

38 

43 

21 

5 

5 

5 

5 

5 

5 


r i 

1 | 

l 

1! 

2 i 

3» 

4* 


Summing may be either up or down; i.e., the origin may be taken at 
either end of the table. Here the origin has been taken at 10. 

After L° each summation ceases one interval lower than in the 
preceding column. 

The sums at the heads of the successive columns have the following 
meaning: 


D° 

S 1 

V U) 

V (2) 


2/ = N = 700 


Nv a) = 3355 
Nv (4) /4! 


3355 __ 
700 ~ 
2(8287) 
700 


4*79280 
= 23*67714 


V (3) =- 


6(13734) 

700 


117*72 


24(16641) 

V (4) “ -- 


12 = Nv (l) /2», 2 3 - Nv (3) /3! 


mean = 10 + 4*793 = 14*793 

v 2 ~ V (2> ~V % — 28*47000 

(jl 2 v 2 - (v x )» - 28*47 - (4*79286) 2 
- 5-4985 

V 3 = V (3) + 3V(2) + * 

= 117*72 + 71*03142 + 4*79286 

- 193*54428 

P'3 ~ ^2 + 2V X 8 

- 193*54428 + 220*19816 - 409*3581 
= 4 384 


700 



202 INTRODUCTION TO STATISTICAL CALCULATIONS 

The amount of arithmetical work may be somewhat reduced by 
summing from both ends of the table towards an origin near the 
mean. The method is illustrated below: 


Example VIII. Data as for Example VII. 


M,p. 

X. 

/• 

L°. 

Lb 

£ 2 . 


r. v. ' ' 

i " ■ 

10 

9 

9 

9 

9 

9 

! 9 

11 

36 

45 

54 

63 

72 

81 

12 

75 

120 

174 

237 

309 j 

390 

13 

105 

225 

399 

636 

945 1 

1335 

14 

116 

34] 

740 

1376 

2321 

3656 

1 

15 

107 

359 




i 

16 

88 

252 

595 



i 

17 ; 

66 

164 

343 

636 

1 

' 

18 

45 

98 

179 

293 

445 1 


19 

30 

53 

81 

114 

152 ! 

3 95 

20 

18 

23 

28 

33 

38 j 

43 

21 

5 

5 

5 

5 

5 i 

1 5 

j 

r : 

l 

1 1 

2! 

3- | 

4! 


Th<* method of summation will be obvious. The factorial moments 
are obtained from the underlined figures as follows: 





CALCULATION OF MOMENTS 


203 


Thus v 3 = v (3) + 3v (a> + x 

= - 16-080 + 17-2467 - 0-20718 
« 0-959 

which gives [i 3 = 4-384 as before. 

III. The Moments of a Continuous Distribution 

The methods of calculating the moments just described are 
applicable to discrete distributions only. A general illustration of 
the derivation of the moments of continuous distributions is much 
beyond the scope of this book. The following examples, however, 
show how moments may be obtained in some simple cases, and 
further illustrations are given in Chapter IX. 

v r r= / x r f(x)dx, yt r = (x — x) r f(x)dx , when the area of the 

J a J a 

distribution curve is unity; otherwise v r = ^ 

^ = AreT L {X ~ m{x)dK - 


J a 


x r f(x)dx and 


Moments of a 2nd Degree Parabola 

Example IX. Let the positive portion of the curve y = 2ax — x 2 
represent a continuous frequency distribution. Calculate 

(i) the first three moments: (a) about zero, and (b) about the mean; 

(ii) the mean deviation and the quartile deviation. 
x has all values from 0 to 2a. 




r2a 


4 a 3 


Area — 

v o 

— / ( 2ax — x 2 )dx 

Jo 

“ T 




3 

f‘2a 

3 /16 

\ 

x — 

V 1 

4 a 3 j 

f xf(x)dx = 
0 

^h ai - 

- 4a 4 } 



3 

[2a 

3 / 

32 


V 2 

4 a 3 

/ x*f{x)dx = 
/0 

~ 4^( 8a ' 

- 5^ 



3 

[2a 

8 



v 3 

4 a 3 ^ 

/ x*f(x)dx - 
'o 

= ~a<*, 

5 



The formulae for converting moments about an arbitrary origin into 
moments about the mean must now be applied. 

Thus |j- a = v 2 — x 2 , 



a 2 


a 2 


5 



204 INTRODUCTION TO STATISTICAL CALCULATIONS 


jx 8 = v 3 — 3^v 2 + 2x* 

-1“’ ~ J “‘ + 7“' “ 


Transfer the origin to (a, 0). Then y 


= (a 2 — * 2 ), — a ^ 

AT 2 )^AT = —. 

t) 




Deviation 

The sum of the deviations from the mean disregarding their signs is 
twice the sum of the positive deviations. 

3 c 1, 

/. Mean deviation = 2 x — j xf(x)dx 

hi 


«rL v _L.T 

4a a L2 4 J () 

Note that a : mean deviation :£= 1 : 0*84. 
Quartile Deviation 
Let the Q.D. = X 

/-A 


3a 

8 


Then 


r\ \ 

j Md* = - 4 


area = —- 

O 


whence X=^ 0*3473, when a = 1 and generally, X = 0*347a. 

Example X. If /(#) — J(1 — * a ), find the area of the distribution 
included between the ordinates at # = 0 and ^ = + a 
The area of the distribution is I and a = V^O-2. 

3 rv'i72 


/. required area 


/ 

Jo 


(1 - * 2 )d* 


0*2 

V0*2 

o 




0*313, 


i.e., 31*3% of the total area under the curve. 


The Triangular Distribution 

Example XI. The distribution is an isosceles triangle: 

— a ^ x ^ 4- a. Height of triangle — b. 
Calculate the mean, standard and quartile deviations. 
Area of triangle =* ab. 

* = 0. 

, b 

y « b - x, 

a 


Equation of R.H.S. is 



CALCULATION OF MOMENTS 


Mean Deviation 


M.D. 


U‘b x -i 

3 ab 


\dx -T- ab 




Standard Deviation 


Quartile Deviation 


f 

2a 


2 f a i b \ 

-- - I x % [b - x)dx 

ab I {) \ a ) 

= a \A- 

(b - 


X + 

4 


0 


X = 0-293a 

a : M.D. : Q.D. = 0-408a : 0-333a : 0*293a 
- 1 : 0 82 : 0*72. 


The Rectangular Distribution 

Example XII. Let length of base = L, i.e., 0 ^ x L, Then, since 
area is to be unity, f(x ) 


1 

V 


Taking moments about the origin 

1 r L 

Vj = — / xdx = \L — x 
o 

v 2 = ^ J xUx = ^L 2 
v s = i = *L 3 

when =a ^L 2 — £L 2 = -j^L 2 , etc. 

Example XIII. Taking origin at 0^ show that 



206 INTRODUCTION TO STATISTICAL CALCULATIONS 


. Example XIV. 

Let x have all values from — \ to + 4*. 

Then 

II 



r 

x r dx — f x rhl l 


-l b + 1 J-i 

1 ,1* 1 

e.g., 

(** = [_: 

-X s =- 

3 J_j 12 

1 = 02886. 

/ 12 


0== \ 


Exercise 7 

1. Calculate the first four moments of the following distribution about 
x ~ 4, and thence find the moments about the mean of the distribution. 
Find also the values of ^ and p 2 

x: 0 1 2 3 4 5 6789 10 

/, 5 10 30 70 140 200 140 70 30 10 5 S/ ^ 710 

2. The first four moments of a distribution about x — 2 are 1, 2 5, 5-5 and 
16. Calculate the four moments about x and about zero 

3. Number of Dairy Farms in England and Wales, Classified 

According to Cost of Production of Milk (1935 -36) 

Cost q{ production 

(pence per gal.), * . 4-5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- 
No. of farms, /: 4 9 34 77 94 88 65 40 15 4 5 2 £/ --- 437 

State the values of fx 2 and fi 3 . 

4 Height (in 

ins.), (x) : 30- 32- 34- 36- 38- 40- 42- 44- 46- 48- 50- 
No. of boys 

(/): 3 6 30 107 356 543 338 92 20 4 1 £/=1500 

What is the mean height? Calculate the second, third and fourth moments 
about the mean, stating your results in class-interval units. 

5 * 40- 41- 42- 43- 44- 45- 46- 47- 48- 

/.I 2 4 7 12 20 38 52 40 

*: 49- 50- 51- 52- 53- 54- 

/; 29 19 14 6 4 2 S/-250 

Calculate the 1st four moments about 47-5. Convert these results into 
moments about the mean and calculate and 

6. x : 2-5- 7*5- 12 5- 17*5- 22*5- 27*5- 32*5- 

/: 2 5 10 18 8 6 1 £/« 50 

Calculate /z 2 , and p 4 . 

7.. Calculate the values of and /3 8 from the following data * 

* 10- 12- 14- 16- 18- 20- 22- 24- 26- 

/: 3 30 110 218 275 222 108 32 2 E/^J000 

8 . The relative frequency density of a continuous distribution is given by 
f(x) = 2 -f ^(16 — x 2 ), where x has all values from —4 to +4. Find the area 
under the curve and the values of /x 2 and fi 3 . Find also the mean deviation 
and the quartile deviation. 



CALCULATION OF MOMENTS 207 

9. Transfer the origin of the above function to the point (—4*0) and hence 
find the first three moments about zero, 

10. Show that moments about the origin may be derived from moments 
about the mean by the following formulae : 

* *>2 = f*2 4* * 2 

~ fi 8 + + x 3 

V A “ P 4 4- 4^3 + 6# 2 + X 4 

11 Calculate the moments about zero for the distribution of Question 7 
above. 

12. Mid-point * : 30 31 32 33 34 35 36 37 38 39 40 41 42 

/: 2 6 17 37 65 90 97 82 56 30 11 5 2 E/ ^ 500 

Calculate fx 2 , fi 5 and also and # 2 . 

13. A frequency distribution is represented by the portion' of curve 
y 12# -f # 2 — x 3 above the axis of x. 

Compute (a) the area under the curve, i.e., 0 ^4, 

(b ) v x and v z about zero; 

(c) (i 2 and a. 

14. A rectangular distribution has all values of # from x = 0 to x =• 2L. 
The area of the distribution is unity. 

Compute (a) v 2 , v 3 and v x about x — 0; 

(b) /a 2 , and 

(c ) the mean deviation and the standard deviation. 

15 A frequency distribution is represented by the two sides of an isosceles 

triangle and the included base x has all values from — — to 4- —. The 

mm 

* 

equation of the right-hand side of the triangle is y - mx -f c. 

Calculate (1) the area of the triangle, (2) /x 2 



CHAPTER VIII 


ELEMENTARY PROBLEMS IN PROBABILITY 
I. Definitions 

Statistical Frequencies as Measures of Probability 

Let a frequency distribution have a total of N items, with t classes. 
Then, if the number of items in the rth class is f r , the chance that 
any specified item is in the rth class is, in the absence of any other 

information, taken to be say 

The chance that the item is not in the rth class is — , say'~ , 

where || + 5=5 L ° r certainty. The odds in favour of the event are 

and the odds against it 

Consider the following frequency distribution: 


(*). 

(/) 

fl N 

= p 



A 

/i/N 

= p 1 

Probably that item is in 1st class 

*2 

fz 

A/N 

= Pi 

,, ,, 2nd M 

■*3 

fz 

/./N 

= Pt 

3rd „ 


Jr 

/r/N 

= Pr 

,, rth ,, 


2//N = ££ = 1 


The following statements of probability should be carefully 
followed: 

(i) The probability that a specified item is in the first or second 
class = p x + p 2 > the separate chances must be added, 

(ii) The chance that a specified item is in the first class and that 
another specified item is in the third class ~ p x x p z , i.e., the 
separate chances must be multiplied, 

(iii) The chance that both items are in the third class = p z x p z > 
and consequently— 

(iv) The chance that neither of them is in the third class 

This is often written q B X q Bf where q is the probability of an event 
not taking place; i.e., (p + q) = 1 , or certainty. 

208 



ELEMENTARY PROBLEMS IN PROBABILITY 

(v) The chance that one or both , i.e., at least on$, of them 
the third interval is the sum of the following chances: f 

(a) Chance that first item is and second item i$ Hot;: 

Ps( l — ^ 3 )* ' 

(£>) Chance that second item is and first item h toot: 

P&-P& 

(c) Chance that both are: (p 3 ) z - 
i.e., the probability — 2 p 3 — p 3 2 . 

(d) The remaining possible event is that neither of them k 
in the third class, i.e., (1 — p 3 ) 2 . 

The sum of (a), ( b ), (c), (d), if correctly stated, must add up 
to 1, or certainty, since all possible events have been included. 

2p*~Pz 2 + (l-pz) 2 = l- 

Such a problem may be dealt with more briefly as follows: 

The chance that neither of the items is in the third class 

== (1 Pz)'^ 4 

Chance that one, or other, or both are 

== ! (1 P 

2_/> 3 — p 3 2 . 

(vi) The probability that if 10 items are picked at random, i.e., 

so that each of the N items has an equal chance of being included 
amongst the 10, they will all be in the fifth class—/ 5 being greater 
than 10—is p 5 10 , and the probability that none of them will be in 
the fifth class is q b 10 or (1 The probability that at least 

one of the 10 items will be in the fifth class is consequently 1 — q h 10 . 

Two events are said to be independent when the success, or failure, 
of either of them has no effect on the success or failure of the other. 

When the success or failure of event A alters the probability of 
success of a second event B, then event B is said to be dependent 
on event A. 

If the occurrence off either of two events makes the probability ^ 
of occurrence of the other event zero, i.e., makes it impossible, the 
two events are said to be mutually exclusive . 

Success in the solution of problems in probability depends on the 
correct application of rules relating to the addition and multiplication 
of probabilities. 

The addition rule is applicable to events which are mutually 
exclusive; i.e., the probability that one of several mutually exclusive 
events will happen is the sum of the probabilities of the separate 
events. 





210 INTRODUCTION TO STATISTICAL CALCULATIONS 

The multiplication rule applies to independent events. The 
probability that several independent events will happen is the 
product of their separate probabilities. 

The rule also applies to dependent events, where the probability 
of the occurrence of A and B is the product of the probability of 
event A and the probability" of event B, on the assumption that A 
has happened. 

The majority of problems involve the application of both the 
addition and multiplication theorems. 

II. Theoretical Probabilities 

Let p be the probability that an event will be successful, and q 
the probability that it will fail. 

(i )p~q = i, 

as in the case of the probability of getting a " head " in a single 
toss pf an unbiased coin. 

If single coin is spun 3 times, or i^hat amounts to the same 
thingTlLcpins are spun once, there are eight possible events: 

H. H. H., which can occur in one way only; 

H \f. H. T., which can occur in 3 ways, since the order is 
immaterial; 

H. T. T., which can also occur in 3 ways; 

T. T. T., which can occur in one way only. 

i.e,, of the eight possible events 

1 is favourable to 3 heads 
3 are „ 2 heads, 1 tail 

3 are ,, 1 head, 2 tails 

1 is „ 3 tails. 

The theoretical probabilities of 3, 2, 1,0 heads respectively in a 
single tossing of 3 coins are therefore |, f, f, which are the terms 
of the expansion of (-$• + i) z . 

Similarly if 4 coins are tossed the probabilities of 4, 3, 2, 1,0 
heads are easily seen to be 

-i$> ~h 

which are the terms of the expansion of (£ + £) 4 . 

In general if n coins are tossed, the successive coefficients in the 
expansion of (ty + ^) n are the probabilities of n, (n — 1), (n — 2) 
. . . 3, 2, 1, 0 heads. 



ELEMENTARY PROBLEMS IN PROBABILITY 

t is inserted merely for convenience. Its utility will be made * 
obvious in subsequent examples. >• " '• 

In the above expansion there are (n + 1) terms. The (r’+ l)th , 

Yl ^ 

term, or general term, is - y (^)V, and the probability of r 

heads is the coefficient of t r in this expansion, i.e., the probability of 

ft | 

r heads = P(r) = zr jj ] &)"> or (¥ + i)” is called the 

generating function of probability in the tossing of n symmetrical 
coins. 


Example I. Eight coins are tossed. What is the probability of 
obtaining: (a) 4 heads, (6) 5 heads or more, (c) at least 4 heads, (d) at 
least 1 head ? 

(\t + l) 8 = rb (* 8 4 W + 28/ 6 + 56* 5 + 70* 4 -|- 562 3 + 282 a +8/ + 1) 

(a) P(4) - coeff. of / 4 = 

(b) Probability of 5 heads or more = sum of coefficients of t e , t 7 , t*, t 5 , 

(1 + 8 -f 28 4- 56) 

“ 266 

" 256’ 

Q*i _L. 70 

(c) Probability of at least 4 = - -— 

163 

256’ 


(d) Probability of at least 1 


-('-ml 

_ 

“ 256’ 


Example II. Sixteen coins are tossed once. What is the probability 
of obtaining (a) exactly 8 heads, (b) exactly 11 heads ? 

161 /l\ lfl 

(а) Probability = gTg! ( 2 ) “°' 2 ‘ 

(б) Probability = ^ (j) = 4368 x (-) 

— I 

” 15’ 

This, of course, is also the probability of getting exactly 5 heads. 
Suppose the question were: What is the probability of obtaining at 
least 11 heads? # 

This entails the summation of the several probabilities of obtaining 



212 INTRODUCTION TO STATISTICAL CALCULATIONS 


16, 16, 14, 13, 12, 11 heads, which is a somewhat laborious calculation. 
When n is large it may be quite out of the question to use this direct 
method. "An alternative method is given in Chapter IX. 

Four figure logarithms have been used above. Seven figure tables 
may be needed in some cases, but in general P need not be stated 
correct to more than three places of decimals. 

Example III. If 20 dates are named at random, what is the proba- 
v bility that 5 of them will be gundays 


PW-= 


2<M/6\ 13 /1\ 6 _^ 1 
1515! \7 j \7/ ~il 


( 2 ) p^q. 

Example IV. An ordinary six-sided die is thrown 4 times: what are 
the probabilities of obtaining 4, 3, 2, 1, 0 " aces ” ? 

The g.f. * (pt + q)* 

= (i* + |) 4 = T 2 W + 20/ 3 + 150/® + 500/ f 625). 

The probabilities are, therefore * 


No of aces 

4 

3 

2 

J 

0 


Probability. 

] 

1296 

20 

1296 

150 

1296 

500 

1296 

625 

1296 


2 ;p « 1 

D u U'V+ * o (1 + 20 -f 150) 171 

Probability of 2 or more aces = -— --- ~-. 

J 1296 1296 

Probability of at least 1 ace = 1 — 

- 671 
” 1296' 

In general, the successive coefficients in the expansion of (pt -f q) n 
give the theoretical probabilities of n, (n — 1), (n — 2), etc., 
successes at a single trial, or the successive coefficients in the 
expansion of (q + pt)* give the theoretical probabilities of 0, 1, 2, 
etc., successes at a single trial. It is often preferable to use the 
second form of the expansion. 



ELEMENTARY PROBLEMS IN PROBABILITY 213 

The general term of the expansion (q + pt) n , i.e., the (r + l)th term, , 

is \n^r)lr \ or 

where the coefficient of t T gives the probability of r successes, and 
consequently (n — r) failures at a single trial. 

The forms (^}q n ~ r p r and n^q 71 T p r are also used for the general 

term by some writers. 


E cample V. Twelve ordinary dice are thrown once, 
probability of obtaining exactly 4 aces ? 

Probability of 4 aces — (4 + l)th term of (■§ + J-/) 12 


- 12 C 4 (f )*(£)* 


12 . 11 . 10.9 5 8 
1 . 2 . 3.4 6 12 


What is the 


- 0-0888 


1 

__ 

11 

The probability of obtaining no ace ~ (|-) 12 , and consequently the 
probability of at least 1 ace = 1 — (f) 12 

- 0 - 888 . 

Ratio of the (r -f 1 )th term to the rth term 

(r + l)th term ___ (w — r -f 1 )p 
rth term qr 

The probability of obtaining 4 aces when 12 dice are thrown has been 
shown to be 0-0888. 

(using ratio) probability of obtaining 5 aces 

- 0,0888 X (12 - 5 -f- l)i 

| X 5 

.= 0-0284 

and probability of obtaining 6 aces — 0-0284 x yg- 

= 0-0066. 


(3) Tossing of Unsymmetrical Coins 

If one biased coin —p = probability of a head at a single throw— 
is tossed repeatedly, then the g.f. of probability is [pt + q) n as above. 
If 2 or more biased coins are involved with different values of p — 
say p x and p t —the situation is of course altered. 


Example VI. The probability of a head at a single throw of a biased 
coin is p x and that of a tail q v A second biased coin has probabilities 
P% and q % > If the 2 coins are tossed once there are four possible events, 
viz. HjHg, HjTg, TjHtj, TjT a . 

These four events are exhaustive and mutually exclusive, i.e. the 



214 INTKODUCTION TO STATISTICAL CALCULATIONS 

* sum of their separate possibilities must be unity. The probabilities are 
given by the coefficients in the product 

( Pi* + <?i )(Ptt + ?#) = PxP%t 2 + Pifat 1 + Mi* 1 + 
where the index of t r on the R.H.S. of the expression indicates to which 
event any particular compound probability refers. 

i.e., P1P2 ( or Pu) is the probability of 2 heads ~ i 2 

(M2 H- Pat) » » 1 - 1 1 

q x q z ,, ,, 0 head — t°. 

If n biased coins with probabilities of success, p v p z , . . . p n are 
tossed together the probability of x heads is given by t x in the expansion 

Of (MMlKM + ft) • • • (Pn f + ?«)• 

For example, if n — 3 

(Pi* + qi)(P*t + q%)(P*t + q z ) = /W 3 + (Pitft + Pitfs + p iZ q i)t* 

+ (M 23 + Ml 3 + Ml 2 ^ + ?123 

where Ms ~ p x x p % X M etc. 

Whence if p x = p 2 — p z = we have 

+ w * 2 + M* + ft 

indicating that the probability of 2 heads is and that of no head 
Example VII. A biased coin is tossed n times. Each time a head 
turns up the thrower wins 3 d .; each time a tail turns up he loses Id. 
If the probability of a head at a single throw is ^ and the coin is tossed 
four times, what are the thrower’s possible gams or losses and their 
respective probabilities ? 

The required probabilities are given by the expansion of 
ft* 4- $t‘ i) 4 —he., of + 2)* 

+ 2) 4 « 3 H ' 4 (/ 16 + 8t 12 -f 24/ 8 + 32* 4 4- 16) 

= + ft* 4 + If + 

This shows that the probability of a gain of— 

I2d. is -gj 
8 d. „ -gr 
4 d. „ ft 
Orf. „ |f 
— 4rf. „ If 

it 

Example VIII. The conditions are the same as in Example VII, 
excepting that p = 

Then (^ 8 + ^" l ) 4 - 2- 4 *~ 4 (* 4 + l) 4 

=-iV ia +'&<*++ a*-*. 

Example IX. Three dice are tossed and the number of spots on the 
upturned faces counted, (a) What is the probability that the sum of 



ELEMENTARY PROBLEMS IN PROBABILITY 


215 


the spots at a single throw is 12 ? ( b ) What is the probability that the 

numbers shown are 2, 4, 6 if their sum is 12 ? 

The total number of events is 6 3 = 216. 

A sum of 12 is obtained from any of the following combinations; 
Combinations which yield permutations. 


1, 5, 6 6 

2, 5, 5 3 

2, 4, 6 6 

3, 4, 5 6 

3, 3, 6 3 

4, 4, 4 1 


25 

(a) /. Probability = - 2 y~g-. 

(b) Probability = 

When only 3 dice are tossed the relative frequency of any sum of 
spots, from 3 to 18 inclusive, can easily be obtained as above. With 
more than 3 dice the direct method soon becomes impracticable; with 
four dice there are 1296 (i.e., 6 4 ) events and with five dice 7776 (i.e., 6 5 ) 
events. 

Example X. What is the probability of obtaining a sum of 10 in a 
single throw of five dice ? 


Combinations 

No of 

giving a sum of J 0. 

permutations 

I, 1, 1, I, 6 

5 

1, 1, 1, 2, 5 

20 

1, 1, 1, 3, 4 

20 

1, 1, 2, 3, 3 

30 

1, 1, 2, 2, 4 

30 

1, 2, 2, 2, 3 

20 

2, 2, 2, 2, 2 

1 


126 


p = 

Example XI, Five dice are tossed once. What is the probability , 
that the numbers on their upper surfaces will be 1, 1, 2, 2, 4, in any 
order ? 

P = X 0004. 


Note that the number of different arrangements of n things, when 
p of them are alike, q of them are alike, r of them are alike and the 

n! 


rest of them all different is 


p\q\r\' 


Then the number of different arrangements of 7,8,7,7,9,8,9,1 is 

8! 

3!2!2!' * 



216 INTRODUCTION TO STATISTICAL CALCULATIONS 


When n dice are thrown, the frequencies of the various sums of 
spots —n to 6n —are given by the coefficients in the expansion of 
(x + x 2 + x 3 -f x 4 + x 5 + x*) n 

the total number of frequencies being 6 n and the frequency of a 
sum of r being the coefficient of x? in the expansion. 

But (* + *»+...+ *e )B = 

and therefore the frequency of a sum of r is given by the coefficient 
of x r ~ n in the expansion of 

(1 - x«) n 


Example XII. Four dice are tossed. What is the chance that the 
sum of the spots is 12 ? 

r = 12, n = 4, required chance = coeff. of a 12 " 4 in the expansion 

(1 - **)* 


of 


(1 - A 8 ) 4 
(1 - A) 4 


(1 - X)* 


6 4 


= (1 - a®) 4 (1 - x)~ 


= (1 - 4a 6 + 6a 12 - .)(1 + 4a + 10a 2 + 20# 3 + 

35a' 4 + 56a 5 + 8 4-a 6 + 120a 7 + 165a 8 + 220a 9 + 286a 10 + 
364a 11 4- 455a 12 + 560a 13 +...). 


Frequencies of a 0 , a 1 , a 2 a 3 , a 4 a 5 . a®. a 7 . a 8 . a®. 

sum of 4 5 6 7 8 9 10 11 12 13 

are therefore 1 4 10 20 35 56 (84-4) (120-16) (165-40) (220-80) 


The distribution of frequencies is symmetrical. The maximum 
frequency is 146 for sums of 14. There is no need to proceed further 
as the frequencies of sums of 4 and 24, 5 and 23, etc., are the same. 

The frequency of sums of 12—-coeff. of a 8 in the above—is the coeff. of 
(1 x 165a 8 ) - (4a 6 x 10a 2 ) —i.e., 125 


required p 


125 _ J2S 
G 4 “ 1296' 


Frequency of sums of 13 = coeff. of a 9 . 

= 220a 9 - 4a« x 20a 3 = 140a 9 . 
i.e., 140. 

It is not necessary to write out the expansion, as above, if the 
frequency of one sum only is needed. Only those terms need be con¬ 
sidered which affect the particular power of a required. 

For example, 5 dice are tossed, and the frequency of sums of 14 is 
required—i.e., the coeff. of a 9 in the expansion of (1 — a 6 ) 5 (1 — a)~ 5 . 

Expansion = (1 — 5a 8 + 10a 12 . . .) 

(1 + 5a . . . + 35a 3 715a 9 , . .) 

* (1 X 715) - (5 X 35) * 540, 


Coeff. of a 9 




ELEMENTARY PROBLEMS IN PROBABILITY 


217 ' 


If an ordinary die is tossed a number of times the average, or 
“ expected ”, score is 3*5, i.e., 

If n such dice are tossed the expected score is 3 -5n. 

Example XIII. (a) Calculate the theoretical frequencies of sums of 
spots, 3 to 18, when 3 dice are thrown. Arrange the results in a 
frequency distribution and verify that mean = 10*5, i.e., 3-5w, and 

that a — V 8-75, j.e., 

(b) Repeat the calculation when n = 4, and verify your results as 
above. 



Example XIV. A die in the form of a regular tetrahedron has its 
faces marked 1, 2, 3, 4, respectively. When tossed the number on the 
downward face is the score. The mean score, when n such dice are 


thrown, is 2*5 n and 


/25n 

° V 20* 


Obtain the relative frequency of scores from 4 to 16 inclusive when 
four such dice are thrown. Calculate x and a. 

The relative frequency of scores of # is given by the coeff. of i £ ~ ti in 

(1 — J 4 ) 4 

the expansion of —- 

The following hint may assist the reader to pick out the coefficient of 
any term in the expansion of (1 — x)~ n . 

(12 + 1) (12 + 2) 13 x 14 

2! 2 


(1 — A')* 8 coeff. of x 12 = 
(1 — x)~ l coeff. of x 12 = 

(1 - X) 


(12 + 1)(12 + 2) (12 + 3) 
3! 


coeff. of x* - < 9 + »<» + m + m ± t! 

4! 


= 91. 
455. 

715. 


Example XV. A triangular prism, rectangular faces marked J, 2, 3 
respectively, a tetrahedral die, faces marked 1, 2, 3, 4 respectively, and 
an ordinary cubical die are tossed. The spots on the under faces of the 
prism and tetrahedron and on the upper face of the cubical die ar^ 
counted. What is the probability of a sum of 7 ? 

A sum of seven could be made up of any of the combinations 1, 1, 5;. 
1, 2, 4; 1, 3, 3; 2, 2, 3. 


Combination. 
1, 1, 5 
1, 2, 4 

1, 3, 3 

2, 2, 3 


Total events — 3 x 4 x 6 = 72. 


Ways of occurrence. 

1 

4 

3 

3 

11 





218 INTRODUCTION TO STATISTICAL CALCULATIONS 


By formula the required probability is the coeff. of at 7 *" 3 in the 


.1 1 1 (1 - * 3 )(1 
expansion of - x ~ x --— 


%*)(! - x*) 


(I - X)* 

Verify that the frequencies of sums 3 to 13 are 

1, 3, 6, 9, 11, 12, 11, 9, 6, 3, 1 


11 

i e -' Vi' 


Example XVI. How many solutions are there to the equation 
*4 b + c + d = r 

when a , b, c, d can take any of the values 1, 2, 3, 4, 5, 6 ? 

This is equivalent to the problem of finding the frequency of a sum of 
r when 4 ordinary dice are tossed. 

Example XVII. Two triangular prisms have their rectangular faces 
marked 1, 3, 6 and 2, 4, 5 respectively. If they are tossed together what 
sums may occur and what are their various probabilities ? 

jit 1 + / 3 + / •)(/ 2 -f + / 5 ) = £(/ 3 4 - 2/ 5 + + t 1 + 2/ 8 4 / 10 + * u ) 


Sums 

p 

3 

1/9 

5 

2/9 

6 

1/9 

7 

1/9 

8 

2/9 

10 

1/9 

11 

1/9 


Problems involving Variable Probabilities 

The chance of drawing an ace from a complete pack of cards is 
i.e., -fa. If the card drawn first is replaced before the second 
draw, the chance of drawing two aces in two successive draws is 
~h X -pj, i.e., jjg. If the card drawn first is not replaced the 
chance is ^ X ■&, i.e., jjj. 

Similarly if a bag contains 8 blue and 4 red balls the chance of 
getting 2 blue balls in successive draws is (f) 2 if the ball first drawn 
is replaced and f X t 2 *-, i.e., if, if it is not. 

With replacements, therefore, the chance of a particular event is 
constant; without replacement it varies from drawing to drawing. 

Drawing Balls from Urn without Replacement 

Let n — no. of balls in urn initially 


np — 

,, white balls 

nq — 

,, black balls 

r = 

,, balls drawn 

s = 

,, “ successes 1 



ELEMENTARY PROBLEMS IN PROBABILITY 219 


where s may have any value from 0 to r ; then probability of $ 

npC nqC ‘ 

t~\ / \ 

successes P(s) =■ 


or 


pw - 


Example XVIII. An urn contains 7 white and 5 black balls. Four 
are drawn at random without replacement. A white ball is a success. 
Give the respective probabilities of 0, I, 2, 3, 4 white balls. 


No. of white balls. 
0 
1 
2 

3 

4 


6 C 4 - 12 C 4 = 
7 C 1 5 C 3 - 12 C 4 - 
7 C a s C 2 ~ 12 C 4 = 
?r sr 12 r ~ 

'-•'3 1 v 4 — 

? C 4 12 Q 


5 

-- 495 

1/90 

70 

~ 495 

14/99 

210 

-7- 495 

42/99 

175 

495 

35/99 

35 

~ 495 

7/99 


1 


Example XIX. Find the probability of obtaining 5 white balls, under 
the above conditions, when 

n ~ 20, np — II, nq = 9, r = 9 

Probability - ”C 5 »C 4 *>C 9 - = 0-347. 

, The first white ball could be drawn in 11 ways. The second could 
! be drawn in 10 ways, and so on. Consequently the whole five could 
be drawn in 11 x 10 x 9 x 8 > 7 ways. 

The same 5 balls can be arranged m 5! ways, i.e., 120 ways, so that the 
number of ways in which different sets, i.e , combinations, of 5 white 
11 ! 

balls can be obtained is - or U C 5 . 

5 to! 

9! 

The four black balls can be got in ways, and, since the U C 6 ways 

of getting 5 white balls can be associated with any of the ®C 4 ways of 
getting 4 black bails, the number of different ways in which 5 white 
balls can occur in a drawing of 9 balls is 11 £ 8 x *C 4 . 

| The number of different drawings of 9 balls out of 20 is 20 C 9 ; hence 
the required probability is 

P (5) — ill x 

Kl 5!6! 4!5!/9'll! 

npQ nqQ 

The mean of this series, of which P(s) = — 


is the general term, 



220 INTRODUCTION TO STATISTICAL CALCULATIONS 

Card Problems 

In the following examples cards are drawn from an ordinary pack, 
and no card drawn is njplaced. 

Example XX. Four cards are drawn. What is the probability that 
they are all aces ? 

P = fz x irr x 5% x = ttfW* 

Example XXI. Three cards are drawn, what is the chance that they 
are a king, a queen and a knave, m any order ? 

P = if x ifir x fo = Tfir = 0*0029. 

Example XXII. If 4 cards are drawn, what is the probability that 
they are all of different suits ? 

w 39 w 26 13 _ 3! 13 J __ 2197 

P ~ X 5l X 50 X 49 _ 51 x 50 x49~ 20825' 

Example XXIII. Four cards are drawn. What is the probability 
that they are four honours of the same suit ? 

P ~ 2 x Tl X To X 4 V = TTWTTS' 

Example XXIV. Four cards are drawn. What is the chance that they 
all have different values ? 

P -= 1 x f-f x f~o~ x ££ =•= ifif • 

Example XXV. Four cards are drawn. What is the chance that they 
all have the same value ? 

/) = 13 X 57 X yf X X ^ = 21)8 25 - 

Mathematical Expectation 

Example XXVI. A bag contains 3 blue and 4 white counters. A 
blue counter is worth a penny, and a white counter is worth half-a- 
crown. A man is allowed to draw 3 counters at random. What is the 
value of his expectation ? 

His “ expectation ” is the average of all the sums which it is possible 
for him to draw, when each sum is weighted with the probability that 
it will be drawn. 


(i)- 

(u). 

(Hi). 

(iv). 

(v). 

Possible sums. 

Value. 

P- 

Weight. 

Product of (ii) and (iv 


5. d . 



£ *• 

3 pennies 

3 

A 

1 

3 

2d and 1 h.c. 

2 8 

1 2 

T5 

12 

1 12 0 

Id: and 2 h.c. 

5 1 

if 

18 

4 11 6 

3 h.c. 

7 6 

T5 

4 

1 10 0 


35 

w 7*6875 

Weighted average = —= 4/4J. 


£7 13 9 



ELEMENTARY PROBLEMS IN PROBABILITY 221 

His expectation is said to be 4/4f. It may be regarded as a fair price 
to pay for being allowed to take a lucky dip. If a man paid 4/4$ on a 
large number of occasions, he should just about “ break even 
Example XXVII. A man tosses a tetrahedral die, faces marked I 
to 4, twice. For the first toss he is to receive 1/-, for each spot on the 
underface. For the second throw he is to receive nothing if the number 
of spots is less than at the first toss; otherwise he is to receive 1 /- per 
spot. What is his expectation of gain ? 


Amounts in Shillings Paid for the Various Numbers of Spots 
in First and Second Throws 


No of spots 

No. of spots (second throw). 

Totals. 

(first throw). 

1 

2 

3 

4 


i 

2 

3 

4 

5 

14 

2 

o 

4 

5 

6 

17 

3 

3 

3 

6 

7 

19 

4 

4 

i 

4 

4 

8 

20 

1 70 


No. of possible tosses = 4 2 

Total amount received if each possible toss occurred ~ 70/- 
Expectation = ™ 4$/-. 

Or the calculation may be made as below. 

No. of ways in 
Sum received which said sum 

s could be gained Product. 


2 2 4 

3 3 9 

4 5 20 

5 2 10 

6 2 12 

7 1 7 

8 1 8 

16 70 


Weighted average = $§/- =*■ 4$/-. 

Example XXVIII. There are 10 counters in a bag, 6 of them worth 
5/- each. The other four are of equal, but unknown, value. If the 
expectation from drawing a single /counter at random is 4/-, find the 
unknown value. 

Expectation = -&(5/-) + ^(*/-) = 4/-. 

.*. x = 2/6. 





222 INTRODUCTION TO STATISTICAL CALCULATIONS 

Example XXIX. (a) Find the expectation of a man who is allowed 
to draw 3 counters from the above bag. 

(b) Find also his expectation if he draws n counters, n being less 
than 10. 

(а) 12 /-. 

(б) 4«/-. 

Example XXX. A and B in turn toss an ordinary die for a prize of 
44/-. Th6 first to toss a “ six ” wins. If A has first throw, what is his 
expectation ? 

For the first throw A’s chance is 

B’s „ x 1 

since A must fail before B throws. 

For the second throw A’s chance is (|-) 2 
since both A and B must fail before A’s second throw. 

For the second throw B’s chance is (|-) 3 and so on. 

Thus A's chance = & {1 + (i) 2 + (i) 4 + • • •} 
and B’s „ =*X*{1 + (f) 2 + (f) 4 H- . . . } 

A’s chance is to B’s chance as £ : - 3 ^, i.e.,6 : 5. 

A% expectation = of 44/- = 24/-. 

Example XXXI. A, B and C cut a pack of cards m that o rder. The 
first to cut a spade is to win a prize of 74/-. Show that their respective 
expectations are 32/-, 24/-, 18/-. 

A’s chance = £ {1 + (f) 3 + (|) 8 + . . } 

B’s „ =Jxi{l + (|) 8 T(|) 6 f } 

c’s „ + + 1 

whence A’s chance : B’s chance ; C’s chance 
= 16: 12: 9 

= if : if : 3 h* 

Example XXXII. A card is withdrawn from a pack. If it is a king, 
a queen or a knave, it is replaced; otherwise not. A second card is 
then withdrawn. 

What is a man’s expectation if he is to get £l for drawing a court 
card at the first draw and £2 for drawing a court card, i.e., king, queen 
or knave, at the second draw ? 

(i) First draw: chance of court card — yf 

expectation = /y§. 

(ii) Second draw: 

(a) If result of first draw were a court card p = 

expectation = *§-§ X x £2. 



ELEMENTARY PROBLEMS IN PROBABILITY 


223 


(b) If result of first draw were not a court card 

P — if 

expectation = yf x X £2. 
total expectation — of -£1 -f- (if) 2 of £% + 2 X 1 ' °f /2. 
-14/-. 

Line and Circle Problems 

Example XXXIII. Twelve persons, amongst whom are X and Y, sit 
down at random at a round table. What is the probability that there 
are r persons —r = 0, 1, 2, etc.—between X and Y ? 

X and Y can take 12 P 2 , i.e., 132 positions. Let r = 2. Then there 
will be 2 persons between X and Y if the latter occupy chairs (1,4), (2, 5), 
(3, 6), (4, 7), (5, 8), (6, 9), (7, 10), (8, 11), (9, 12), (10, 1), (11, 2), (12, 3), 
and since these positions can be reversed, there are m all 24 positions 
out of 132 which will give the desired result. 

24 _ 2 x 12 _ 2 

* ” 132 ~ 12 x 11 “ n - 1‘ 

The same result follows when r is given other values, i.e., the prob¬ 
ability that X and Y will be together, or will have 1, 3 or 4 persons 
2 

between them, is also —where n = number of persons involved. 

71 — 1 

The odds against X and Y being together are therefore (n — 3) : 2. 


Does the formula, p ~ 


, hold good for every possible value of r ? 


Example XXXIV. X and Y stand in line, at random, with 10 other 
people. What is the probability that there will be 3 people between 
X and Y ? 

There will be 3 people between X and Y if the latter occupy positions 
(1, 5), (2, 6), (3, 7), (4, 8), (5, 9), (6, 10), (7, 11), (8, 12), which with these 
positions reversed, gives 8x2 favourable positions out of 132. 

p = 1 *&• 

In general, if n persons stand in line, the probability that there will be 
r persons between X and Y is 

v 1 »(*- 1 ) / 


If X and Y are to be together r = 0, and p 


Derangements 

In how many ways can n things—e.g., the first n numbers, or 
letters of the alphabet*—be arranged so that no number, or letter, 
occupies its proper position ? 



224 INTRODUCTION TO STATISTICAL CALCULATIONS 

The numbers 1, 2, 3,4 can be arranged in 4!—i.e., 24—ways. By 
writing these down the reader can verify the following: 


No. in proper order. Times 

4 1 

3 0 

2 6 

1 8 

0 9 


24 


Example XXXV. Four letters are written to four different people 
and four envelopes correctly addressed If the letters are put into the 
envelopes by an illiterate person, i.e., at random, what is the proba¬ 
bility that no letter will occupy its correct envelope ? 

P = ~h = f 

In general, number of derangements of n things 

= w!jl — — + —-J_ -i. , . . _l. (_ 

\ 112! V. ^ [ } nil 

Thus for n ~ 6, the number of derangements is 265. 

Example XXXVI. A young child, who does not know his letters, 
puts cardboard letters. A, B, C, D, E in a line. What is the probability 
that at least one of the letters is in its correct place ? 

Number of derangements of 5 things 

-J 1 _ 1 , 1 __H 

6 ' 12! 3! + 4! 5!.f 

= 44 

p — 1 — i 4 2 - 0 - = j-f. 

In solving probability problems it is often helpful to set out all 
the possible events and calculate the chance of each, thus obtaining 
a check, or a hint as to the value of an elusive one. 

Exercise 8 

1. Weekly wage (sh.) : 30- 35- 40- 45- 50- 55- 60- 65- 

No. of wage earners : 9 108 488 230 112 30 16 7 

An individual is taken at random from the above group. State the proba¬ 
bility that— 

(i) His wage was under 40/-. # 

(ii) His wage was 55/- or over. 

pii) His wage was either between 45/- and 50/- or between 35/- and 
40/-. 



ELEMENTARY PROBLEMS IN PROBABILITY 225 

2. Two individuals are taken at random from the wage-group of Question 1. 
State the probability that— r 

(i) Both their wages were under 40/-. 

(ii) Both their wages were 55/- or over. 

(iii) Both were either in the 45/- to 50/- class or in the 35/- to 40/- class. 

(iv) At least one of them was in receipt of 50/- or over. 


Age (*)■ 

English Life Table No. 10 (males). 

i 

i. 

<h 

°er- 

0 

100,000 

0 07186 

58-74 

5 

90,000 

000343 

60-11 

10 

89,023 

0 00146 

55-79 

15 

88,360 

0-00197 

51-19 

20 

87,245 

0 00316 

46-81 

25 

85,824 

000330 

42*54 

30 

84,416 

0-U6340 

38 21 

35 

82,885 

0 00421 

33 87 

40 

80,935 

0 00562 

29-62 

45 

78,357 

0 00799 

25-51 

50 

74,794 

0 01128 

21-60 

55 

70,041 

0-01614 

17-89 

60 

63,620 

0-02415 

14-43 

65 

54,899 

0-03791 

11-30 

70 

43,361 

0-06035 ! 

8-62 

75 

29,665 

0-09519 

643 

80 

16,199 

0-14500 

4-74 

85 

6,377 

0-21048 

3-50 

90 

1,609 

0-28614 

2-63 

95 

232 

0-37619 

1*97 

100 

15 

0-48350 

1-48 

i 


Table No. 10 is based on the 1931 Census and the deaths of 1930-32* 
The column headed “I*" shows the numbers expected to survive to exact 
age x out of each 100,000 live births; ” q x ” is the probability that a person 
aged exactly x will die within one year; “ ” is the “ complete expectation 

of life ”, that is, the average future life time of a person aged exactly x. 

[.Annual Abstract of Statistics, 1935-46. Source : Registrar-General] 

Use the English Life Table No. 10 to determine the following probabilities— 

(i) That a newly born boy will live to be 30. 

(ii) That a newly born boy will die before 40. 

(iii) That a man just 30 will live to be 50. 

H 



226 INTRODUCTION TO STATISTICAL CALCULATIONS 

(iv) That a man just 30, and his son aged just 5 will both survive 20 
years. 

(v) That neither father nor son will survive 20 years. 

(vi) That at least one of them will survive 20 years. 

4. Use the table to find the probability— 

(i) That male twins, aged just 75, will both die within the year. 

(n) That one of the pair will live, and the other die, within the year. 

(in) That at least one of them will survive the year 

(iv) That of 10 boys, of exact age 15, 7 will attain 60 years 

(v) That at least 7 of the 10 boys will attain age 60. 

5 (i) Nine persons, among whom are A and B, stand in line at random 
What is the probability that there are (a) 2 persons, (b) 3 persons, between 
A and B ? 

(li) Two shillings and 4 other coins are put m line at random. What is the 
probability that the end coins are both shillings ? 

6 Ten half-pennies and 2 shillings are placed in a circle at random. What 
are the odds against the 2 shillings being together 7 

7 In a maze there are 6 choices of alternative turns to be made before 
reaching the centre If the odds m favour of a coriect choice are 3 2 on 
each occasion, what is the probability that a person will make all the choices 
correctly at the first trial ? 

8. Out of 50 rare books, 3 of which are especially valuable, 5 are taken at 
random by a thief What is the probability (a) that none of the 3 is included, 

( b) that 2 of the 3 are included 7 

9 A com is tossed 12 times. What is the chance of getting heads exactly 
as many times m the first 8 throws as in the last 4 throws ? 

10. During the war 1 ship m 10 was sunk on the average in making a 
certain voyage. What was the probability that exactly 3 out of a convoy of 
6 ships would arrive safely 7 

11. If on the average 1 birth in 80 is a case of twins, what is the probability 
that there will be at least 1 case of twins m a maternity hospital on a day 
when 20 births occur ? 

12. Five men m a company of 20 are graduates. If 3 men are picked out 
of the 20 at random, what is the probability that they are all graduates ? 
What is the probability of at least 1 graduate 7 

13. Two men of equal skill at chess play 20 games together. What is the 
probability that 1 of them will win exactly 14 games ? 

14 A bag contains 8 white and 6 black balls If 5 balls are drawn at 
random, what is the probability (a) that 3 are white, (b) that 3 or more are 
white ? 

15. If there are 2 events, A and B, and the probability of B is # when A 
has happened, and y when A has failed to happen, and if the probability of 
A is p, find the probability of B. 

16. An urn contains 5 blue and an unknown number, x, of red balls. If 
when 2 balls are drawn the chance of both being blue is fa, find x. 

17. How many times must a coin be tossed in order that the probability 
of getting at least 1 head shall be approximately 0*95 ? How many times 
must it be tossed in order that the odds in favour of at least 1 head shall be 
100 to 1 ? 



ELEMENTARY PROBLEMS IN PROBABILITY ’ 22 ? 

18. What is the probability of getting a double six at least once in 10 throws 

of two dice ? H 

19. How many throws of two dice must be made if the probability of 
getting a double-six at least once is to be approximately one-half ? 

20. The chances of 3 marksmen of scoring a bull at a single shot are £, 
respectively. If all 3 fire together, what are the possible events and their 
respective probabilities ? 

21. A bag contains x red, y green and z blue balls. What is the minimum 
number of balls that must be drawn at random to ensure getting at least 3 of 
the same colour ? x, y and z are each greater than 3 

22. If the chance that a certain event will occur twice m 3 trials is one- 
third the chance that it will occur once in 3 trials, find the probability that it 
will occur m a single trial 

23. Nine balls are diawn at random from a bag containing 11 white and 
9 black balls. Find the probability that 5 are white. 

24. Set down the probabilities of all the possible events in Question 23 and 
find the mean and standard deviation 

25 Obtain the theoretical frequencies of sums of ** spots "•—3 to 12—when 
3 tetrahedral dice, faces marked 1, 2, 3, 4, respectively, are tossed, and compute 
the mean number and the standard deviation. 

26 A symmetrical com is tossed 10 times Wliat is the probability of at 
least 9 heads ? Show that if the probability of an event at a single trial is p y 
the probability of at least (« -- 1) successes m n trials is p n ~ l {n — ( n — l)p). 

27. One number is taken from each of the two groups 1, 2, 4, 5 and 2, 3, 6, 7 
at random Set out the various possible sums and the probability of each. 

28. Counters marked 2, 3, 4, respectively, are placed in a bag and drawn 
one at a time with replacement. If 3 drawings are made, what is the chance 
that the total score is 8 ? Write down the formula from which p {r) can be 
obtained. 

29. The odds against a certain event are 7:4. The odds m favour of 
another independent event are 4.3 What is the probability that at least 
one of the events will happen ? 

30. Eight tickets are marked 0-7 inclusive. They are picked out at 
random, one at a time with replacement. If 3 drawings are made, what is 
the probability that the sum of the numbers on the 3 tickets is 9 ? 

31. An ordinary die is tossed twice, and the difference between the numbers 
of spots turned up noted. What is the most likely difference and what is its 
probability ? What is the probability of a difference of 3 ? What is the 
mean difference ? 

32. If 2 dice are thrown and 5 coins tossed, find the probability that the 
difference between the numbers of spots shown on the dice will be the same 
as the number of heads shown by the coins. 

33. X tossed 6 pennies twice and Y an ordinary die twice. What are 
their respective expectations if each of them is to get 5/- if the difference 
between his first and second scores—i.e., number of heads or number of spots— 
is 3 or more ? 

34. A man may get one, or the other, or both of prizes valued at x and y 
shillings respectively. His chance of getting x shillings is p t and his chance 
of getting y shillings is p 2 . What is his total expectation ? 

35. A prize is to be given to the one of three men, X, Y and Z, who is the 



228 INTRODUCTION TO STATISTICAL CALCULATIONS 


first to toss II with 3 dice. If the men toss in the given order until a snore of 
11 is obtained, what are their respective chances of winning ? 

36. Bive men draw cards, from a complete pack in each case, for a prize of 
£3 5s. Id., which is to be won by the first to draw a spade. If they draw m 
succession until a spade is drawn, what is the expectation of the third person ? 

37. Two cards are drawn at random from an ordinary pack. If either of 
them is a king, or if both are kings, both cards are replaced, otherwise they 
are not replaced. Another card is then drawn at random. What is the 
probability that it is a king ? 

38. A bag contains 18 counters Ten of the counters arc each of value a, 
4 are each of value b, and the remaining 4 are each of value 2b. A man draws 
2 counters at random Find the ratio of a to b if the man’s expectation is 

4 a 

T 

39 A triangular prism has its three rectangular faces marked 1, 2, 3, 
respectively. It is tossed n times. If 1 occurs on the downward face, 2 is 
added to the score, while if 3 occurs, 4 is subtracted from the score. What 
is the 'generating function of probability m this case ? 

If the prism is tossed twice, what is the mean addition to the score ? 

40. Fifteen persons sit at a round table What are the odds against 2 
particular persons (a) sitting together, ( b) having 3 other persons between 
them ? What is the probability that the 2 will have 5 persons between them ? 

41. (i) A and B stand m line with 11 other persons What is the probability 
that they are standing together? What is the probability that there are 
4 persons between them ? 

(ii) Five florins and 3 pennies are placed m a line at random What is the 
probability that the end coins are both florins? What is the probability if 
there are x florins and y pennies ? 

42 An urn contains M balls : a second urn contains N balls r balls are 
taken from urn 1 and placed in urn 2 r balls are then drawn at random 
from Urn 2. What is the probability that s of the r balls--? < r—came 
originally from urn 1 ? 

43. A pack of cards is cut at random into 4 piles. What is the probability 
that the top card m each pile is (a) a heart, (6) a four, (6) lung or queen ? 

44 The sides of a biased four-sided die are marked 1, 2, 3, 4, respectively 
The chance of a 1 or 3 being thrown is twice that of a 2 or a 4 The die is 
thrown twice Find the respective probabilities of sums of 2, 3, . . . 7, 8. 
What is the mean sum ? 

46. The probabilities of throwing 1, 2, 3 or 4 with a four-sided die at a 
single throw are l, respectively. If the die is thrown twice, find the 

probabilities of sums 2, 3, . . 7, 8, respectively. What is the mean sum ? 

46. Three houses were advertised to be let m a certain district, and 3 men 
made separate application for a house. What is the probability (i) that all 
3 men made application for the same house, (n) that each of the 3 applied 
for a different house, (iii) that 2 of them applied for the same house and the 
third for one of the other houses ? 

47. Suppose that there had been 4 houses and 4 applicants. What is the 
probability (i) that all 4 applied for the same house, (ii) that each of the 4 
applied for a different house ? 

48. A man tosses an ordinary die twice. For the first throw he is to receive 



ELEMENTARY PROBLEMS IN PROBABILITY 


m 


1/- for each spot turned up. For the second throw, if the number of spots 
is equal tp or less than the number in the first throw, he gets nothing; other¬ 
wise he gets 1/- per spot turned up at the second throw. What is his 
expectation ? 

49 A man has 3 pennies, 2 shillings and 4 half-crowns in his pocket. He 
pulls out 3 coins at random. How many different sums can he pull out, and 
which is most likely ? What is the expectation ? 

50. Two balls are drawn at random from a bag containing 6 white and 4 
black balls, and not replaced Two black balls are then put in the bag. If a 
ball is now drawn at random what is the chance that it is black ? If a prize 
of £1 55. is to go to A or B according as the ball is black or white, what is B's 
expectation ? 

51. If m the above example, after 2 balls are drawn at random and not 
replaced, and 2 black balls are put in the bag, 2 balls are drawn at random, 
what is the probability (a) that both are black, ( b ) that both are white ? 

52. A tosses 5 dice and B tosses 4 dice. Compare their chances of obtaining 
a sum of 13 

53 Cards marked 1, 2, 3, 4, 5, respectively, are put in a line at random. 
What is the chance (l) that at least one of the numbers occupies its proper 
position, (ii) that 3 or more of the numbers are in their proper positions ? 

54. In the equation a + b + c + d=x, b , c, d , can take any of the 
values 1 to 6 inclusive. In how many possible solutions would # have the 
value 11? 

55. A bag contains 8 black and 4 white counters, A second bag contains 
4 black and 8 white counters. A black counter is worth a shilling, a white 
counter has no value. One counter is drawn at random from each bag, and, 
after each has been drawn, is placed in*the other bag. What is then a fair 
price to pay for the contents of the first bag ? 

56 If, in Question 55, there are initially # black and y white counters in the 
first bag, andy black and # white counters in the second bag, the other conditions 
remaining the same, what is the fair price for the contents of the first bag ? 

57. (l) If the chance that an event will happen twice m four trials is 0*1536, 
what is the chance that it will happen in a single trial ? 

(ii) The probability that an event will happen three times m five trials is 
| of the probability that it will occur once in five trials. The probability that 
another independent event will occur five times in seven trials is four times the 
probability that it will occur six times in seven trials. What is the probability 
that both events will succeed at a single trial 5 * 

58. Two men, X and Y, throw 2 dice in turn for a prize, Y having first 
throw. X wins if he throws 7 before Y throws 8, and Y wins if he throws 8 
before X throws 7. What are their respective chances "> 

59. Compare the chances of X and Y—data of Question 58—if, after each 
unsuccessful throw, the value of the prize is reduced by 5% of its value before 
the throw was made. 

60. A bag contains wp 1 white and black balls. A second bag contains 
np % white and nq 2 black balls. One of the bags is chosen at random and a 
ball taken from it at random. Show that the probability that the ball will 
be black is 1 — Kpj -f p z ). 



CHAPTER IX 


BINOMIAL, POISSON AND NORMAL 
DISTRIBUTIONS 

I. The Binomial Distribution, (q + p) n 


(i) First and Second Factorial Moments about Zero 


No. of 
successes 
W i 

J Probability of x 
\ successes. 

PW. 

*!>(*). 

*(#-l)P(*) 

0 

q n 

_ j 

_ 

i 

nq n ~ l p 

nq n l p 1 

— 

2 

(»(» —1)/2I) q n - 2 p 2 

??(/* — ])q n ~ 2 p 2 

— l)q v ~ 2 p i 

3 

(»(w-l)(n-2)/3!) 

(«(» —1)(»—2)/2') 

i 

s 

iM 

I 

S 

4 

F-’P 3 ! 

1 V" 3 /’ 3 
i (n(n— \){n — 2) 

(w(w —1)(« —2) 

w — 1 

n 

f>" 

(»-3)/3l) S "-V 

(«-3)/2') 

Sums • 

1 

np 

i 



2P(*) = (q + P) n = ] n 

E#P(*) = np{q n ~ 1 + (n — I )q n ~ 2 p -f {(» — 1)(» — 2)/2!} 
qn-3p 2 — np{q + 

i.e., v (1) ~ x ~ np 

Zx(x — 1)P(#) — «(» — 1 )p 2 {q n ' 2 L (w — 2)q n ~ z p 4- {(n~-2)(n — 3),2!} 

q n ‘*p 2 + . • . } 

= n(n — 1 )p 2 {q + p ) n ~ 2 

i.e., v (2) — n 2 p 2 — np 2 . 

(ii) Variance of (q 4- p) n 
Since v <2) = n 2 p 2 — np 2 and v 2 = v (2) + x 
v 2 = n 2 p 2 — np 2 + np 

= np{np + (1 — p)} — n 2 p 2 + npq 
[jl 2 — n 2 p 2 + npq “ (w/>) 2 
= npq 

and a — V npq. 

The result, a = V npq, is of great importance. 

230 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 231 

(iii) Other Constants of the Binomial Distribution 

v, a , and v (1) can be obtained by evaluating hx(x — 1)(* — 2)P(x) 
and hx(x — l){x — 2)(x — 3)P(*) in the manner of the above table. 
An alternative method is suggested on page 248. 
li s = npq{q - p) 
fx 4 = 3 p 2 q 2 n* + pqn( 1 — 6 pq) 


whence 


p 2 — 3 -f* 


{q-pY/npq 

1 - <6pq 


npq 


(iv) Limiting Form of the Binomial Distribution when n is Large 

The calculation of the various terms of the binomial expansion, 
when n is quite moderate in value, is tedious, and rapidly becomes 
impossibly so as n increases. 

The general term of the binomial expansion, i.e., the probability 
of t successes, is n j 


pw - 


%\{n — x)'r ^ 


( 1 ) 


Thus the chance of exceeding the mean number of successes, pn, 

It can be shown that this expression is approximately equal to 

■ < 3 > 

and since the standard deviation of (p + <]) n is Vnpq this becomes 

1 


P(*) ^ ■ 


■V 2tt 




(4) 


Example I. Twenty coins are tossed, and it is required to find the 
probability of getting exactly 15 heads, i.e., an excess of 5 over tho 
mean. 

20 1 /l\ 20 

»>■ »> p < 15 > “ ilo+W^ts) 

= 0'0148, using 4-iigure logs. 

by (3) n = 20, p = q = x = 15 - 10 = 5 

e = 2-71828 . . . , a 2 = 5 
i log 10 tt = 0-7486 
2-5 log e= 1-0857 
1-8343 

log P(16) = 0 - 1-8343 = 2-1657 
.'. P(16) = 0-0146. 



232 INTRODUCTION TO STATISTICAL CALCULATIONS 


II. The Poisson Distribution 

When p is equal to, or nearly equal to, q , formulae (1) and (4) are 
in close agreement, even with quite moderate values of n. 

When p is small, formula (4) may still be used, with caution, as 
an approximation to the binomial so long as np is not small. 

When p is very small, so that even though n is large np is small, 
the approximation may be poor. 

If p approaches zero while n approaches infinity in such a way 
that np remains constant, it can be shown that 
n\ (np) x 


or, writing 


x ! (n — x)\ 
m np 


pxqn- 


r x\ 
m x 

—f c~ 

x! 


-C~ n P 


This is Poisson’s approximation to the binomial when the last- 
named conditions are fulfilled. 

m T 


p (*) ^ ’x) e “. (5) 

is the general term of the Poisson distribution, which may be written 

00 xn x 

e T (1 -f m + m 2 /2 \ -(--•••) or 32 ™ T e~ x . 

x «. o x ! 


Example II. In turning out a certain component m a factory the 
average number of defectives is 1%. What is the probability of two or 
more defectives in a sample of 100 ? 

Using formula (I)—seven figure logarithms. 

P(0) « (0-99) 100 = 0*36603 

P(l) = 100(0*99)"(0*01) - 0*36973 

P(2) =. -~y?- 9 (0-99) 98 (0-0]) a = 0 18486 

and so on. 

Probability of 2 or more is 1 — 0*73576, i.e., 0*26424. 

Using formula (5) 


P(0) = 


= 0*3679 

P(l) = 


- 0*3679 

P(2) = 

-<£-* 

= 0*1840. 


/. Probability of 2 or more ~ 1 — 0*7358 = 0*2642. 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 233 

The reader should note that the value of 0! is 1, so that the mathe¬ 
matical crime of dividing by zero has not been committed. 

Example III. On the average a certain event happens once in 40 
trials. What is the probability of 3 successes in the next 20 trials? 
What is the probability of 3 or more successes ? 

V{x) = 
v ' x\ 

whence P(0) ~ e~^ ~~ 0*6066 

P(l) = ^e-! = 0-3033 

P( 2 ) = 1|LV j = 0-0758 

P(3) = (|LVi =0-0126 

o! 

P(3) = 0*013. 

Probability of 3 or more — 1 — 0*9857 

= 0*0143. 

Expanding the binomial + fo) 20 we 
P(0) = 0*6026 
P(l) = 0*3091 
P(2) = 0*0753. 


Moments of the Poisson Distribution 

No. of 


successes. 


X 

■(*-1) 

a'(#—1)(#—2) 

it 

ST 

i 

i—< 

! 

X. 

PM- 

xP(at). 

xPW. 

XP(X). 

xPW. 

0 

fi- n l 

— 

— 

— 

— 

1 

?;?/l! 63 


— 

— 

— 

2 

m 2 /21 

w 2 /l! c" 

m m 2 

— 

— 

3 

m 3 /3! 

m*l 2 * 

7» 8 /1! 

f ” m m 3 

— 

4 


m 4 /3! 

w 4 /2! 

m 4 /1! 

e~ m m* 

5 



m 5 /31 

w 5 /2! 

mf 1! 

6 




w 6 /3! 

w 6 /2! 

7 





m 7 /3! 

£P(*) = 

- e~ m (l + 

m + m 2 /2 

!+ . . 

•) 



--- e~ m e m = 1 


%xP(x) = e~ m m( 1 + m + m 2 /2 \ + . . .) 

= e~ m me m = m — v (1) 

l»x(x —* l)P(^) = e~ m M 2 ( 1 + m + m 2 /2! + . . .) 

— e-' fn mh m = m z = v (2) 
liX{% — l)(x — 2)P(%) = e” m m 9 e m = w 3 = v (3) 
Sx(x •— 1)(# — 2)(# — 3)P(#) = e~ m m i e^ n «= m 4 — v (4) 



234 INTRODUCTION TO STATISTICAL CALCULATIONS 


Whence vj = mean = m 

v 2 = m % + m (jt -2 — ni 2 + m — 

V 3 rr= M* + 3m 2 + m 
v 4 = wfi + 6w 3 + 7m 2 + m 


(m ) 3 = w 
{X S --= w 
[jl 4 -= m -f 3m 2 


Example IV. Calculate the mean number of defectives per sample 
from the following table. Calculate also the frequencies of the Poisson 
distribution which has the same mean, and the same total frequency. 


No of 

No of 


Nearest 

defectives. 

samples 


unit 

X 

J 

PM- 

PM X 100 

0 

11 

0-135 

14 

1 

28 

0-271 

. 27 

2 

30 

0-271 

27 

3 

18 

0 181 

18 

4 

8 

0-090 

9 

5 

4 

0-036 

4 

6 

1 

0-012 

1 

7 and over 

— 

0-004 

— 


100 

1-000 

100 


x ~ 2. 

P(A) 

t 

l! 


Column four gives the theoretical frequencies. 


Example V. A car hire firm has 2 cars, which it hires out day by day. 
The number of demands for a car on each day is distributed as a Poisson 
distribution, with mean 1-5. 

Calculate the proportion of days on which neither car is used, and the 
proportion of days on which some demand is refused. 

If each car is used an equal amount, on what proportion of days is 
one of the cars not in use ? What proportion of demands has to be 
refused? (R.S.S. Certificate 1949) 


Demands 

Proportion 


per day. 

of days. 


X. 

PM- 


0 

0-2231 

(i) Proportion of days when neither ca< 

1 

0-3346 

is used == 0-223 

2 

0-2510 

(n) Proportion of days on which some 

3 

0-1255 

demand is refused ~ 0-191. 

4 

0-0471 

(m) Proportion of days on which one onlj 

5 

0-0141 

of the cars is not in use 0-167. 

6 

0-0035 


7 

0-0008 




BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 235 


Demands 

No. of days 

Total 

Demands 

per day 

per 1000 days. 

demands. 

refused. 

0 

223 

— 

— 

l 

335 

335 

— 

2 

251 

502 

— 

3 

126 

378 

126 

4 

47 

188 

94 

5 

14 

70 

42 

6 

3 

18 

12 

7 

1 

7 

5 


1000 

1498 

279 


Total demands should be 1500, if mean was exactly 1*5. 
Proportion of demands refused — 0*180. 

III. The Normal Distribution 

The general equation of the normal curve may be written 



where N = total frequency, or area under the curve, and x is a 
deviation from the mean of the distribution, i.e., (x — x). 

Ordinates of the Normal Curve 

N 

When x = 0 ,y 0 =~- —= = maximum ordinate. 

V, G V 27t 

X 

When x = c, i.e., - “ 1, y a — y 0 e ~* y Q X 0*6067 
* = 2o, i.e., ~ — 2, jy 2 „ y 0 e~ 2 = y 0 X 0-1353. 

£ _ 

Thus when deviations from the mean are standardized- 

(T 

the ratio of the ordinate at any given value of ——or, as it is 
x 

usually written, to the maximum ordinate, y 0) is the same for all 

normal curves, whatever the value of N and a. 

Writing N = 1, a = 1, the equation of the normal curve becomes 




236 INTRODUCTION TO STATISTICAL CALCULATIONS 
Table I, see p. 345, is a table of values of y for this simplified 
equation for a range of values of 

CF 

E.g., x - = 0 v 0 = 0-3989 = 

= 1 tab. y - 0-2420 - 0-6067y 0 
— 2 = 0*0540 - 0*1353y 0 . 

When N ^ 1 and a ^ 1 the tabular values of jy must be multiplied 



Example VI. In the distribution N(-J- + say 1024(| + £) 10 
area = total frequency = 1024 
mean = \n — 5 

npq — \n = 2*5 

a -= 1*581. 

The equation of the normal curve with these constants is 
y = - 1 - 024 - _ ^’/s i e e -(x-t )«/5. 

* 1-681VSS 

For the binomial the frequency of two successes over the mean is 

10 ! 

7! 3! 

and the probability of two successes over the mean is 
10! ^ JL__ _ 12Q~ 

Till * 1024 ” 1024* 

For the normal distribution the frequency of two successes over the 
mean is 

1024 


120 


,*-(7 -*)•/«= 116*1. 


1*581 V2^ 

and the probability of two successes over the mean is 


1 


-( 7 - 5 )*/* 


116*1 


1*581 V2tc 1024' 

With larger values of n the correspondence between the two methods 
is very close, 

x 2 


Use of Table I. Enter the table at - 

a 

Kough interpolation gives tabular y 
£ 0*1792 x 1024 

V* fre< l uenc y = -L581- 

In practice the formula is never used, 


1*581 
« 0*1792. 

- 116*1. 


- 1*2650. 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 237 


Example VII. The mean of a normal frequency distribution 9 f 1000 
items is 25 and its standard deviation 2-5. Use Table I to find the 
ordinates of the curve at the following values of x\ (i) %6*5; (ii) 21; 
(lii) 19*5; (iv) 32. What is the maximum ordinate? 

(i) — -— = i.e., - = 0-6 ordinate — 0*3332 

a 2*5 a 


1000 

ITsT 


(ii) 


(hi) 

(iv) 




a 


Max. ordinate — 159-6. 


= 133-3 
« 0*1109 X 
= 44*4 
= 0*0355 X 


1000 

~¥z 

400 = 14*2 


= 0*0079 X 400 = 3*2 


Areas under the Normal Curve 

In the binomial and Poisson distributions the variable is discrete, 
in the normal distribution it is continuous. Table I may be used, 
as has been shown, to find the value of y for any given value of x , 
but of very much greater importance is the determination of the 
proportions ot the total area under the normal curve between any 
two given values of the variable. 

rA 

i.e., the value of / ydx when total area — 1 

hi 


A f y^x 

J B 


N 


Table II, see p. 346, gives the proportion of the total area which lies 
between the mean ordinate and the ordinate at any standardized value 


a 

as 2 ' 


of the variable. This proportion is j ydx , usually symbolized 
The proportion of the total area between ordinates at x 2 and x 1K 

rx t !<y rxyfo 

where x 2 is numerically greater than x x is / ydx— / ydx , i.e., it is 

Jo Jo 

found by subtracting the proportion between the mean and x x from 
the proportion between the mean and x 2 . 


Example VIII. The mean of a normal distribution is 10, and its 
standard deviation is 2. N == 500. 

(i) What proportion of the frequencies will exceed 11*6 ? 

(ii) How many frequencies will exceed 11*5 ? 



238 INTRODUCTION TO STATISTICAL CALCULATIONS 


0-75. 


Entering the table at - — 0-75 we find ^ 

a JL. 


0 2734. The proportion 


exceeding 11*5 is the area to the right of the ordinate at - 0*75, i.e,, 

G 

it is 0*5 — 0*2734 — 0*2266, and the corresponding number is 
0*2266 x 500 -- 113. 

Example IX. What percentage of the area in the above example will 
lie between x — 11*6 and x - 13 ? 

- - —Zj? 5 , « 0-4332 


proportion — 0*4332 — 0*2734 = 0*1598 
.. percentage — 15*98 

The following diagrams illustrate the symbols commonly used to 
denote the areas of portions of the total area under the curve. 



nr isr 



Fig 9 

The area of the curve is 1 

The ordmate at x 0 bisects the area 

The area to the left of x v Fig II, is called the " stump " 

The area to the right of x v Fig. II, is called the " tail " 

Proportions of Area as Probabilities 

Fig. I. Probability that any individual item lies between x 0 and x x is 

2 

,, ,, ,, is >x x is i-TL? 

Z 

:Fig n „ „ „ is <x x is 

M z 



BINOMIAL,"POISSON AND NORMAL DISTRIBUTIONS 239' 

Fig* HI Probability that any individual item lies between x x and x & is area 

over base x x x v p 

Fig. IV (i) M ,, ,, lies between—^ and is a. 

(ii) ,, ,, ,, lies outside this range is 1 — a. 

(in) ,, ,, ,, lies to right of x 1 is —. 

2 

(vi) ,, ,, ,, lies to left of — x x is 

The reader must distinguish carefully between— 

(a) questions of statements relating to the proportion of 
items lying outside the range x ± ta, where the probability is 
the sum of the two tails, and 

(b) questions relating either to the proportion to the right of 
x + to or the proportion to the left of x — ta, i.e., to the area 
of one tail only. 

The statement that a deviation, ta, is exceeded absolutely by such 
and such a proportion means that jt is a case of [a). 


Example X. In a normal distribution N — 1000, x — 80, a = 15. 

(1) Find the number of items between 65 and 95. 

Here *=41, 


for - = 1, 


0-3413 


proportion — a — 0-6826 

and number = 682-6, say 683. 

(2) Find the probability that a randomly chosen item will be greater 
than 100. 


* 100 - 80 
a “ 15 


1-3333, 


0-4088 


/. = 0-0912 = probability. 

z 

(3) What is the least value that an item in the highest 30% can have ? 

i 

? = 0-2000, - = 0-5244 

2 a 

y x = 0-5244 X 15 

= 7-866 

e., lowest value = 87-866. 



240 INTRODUCTION TO STATISTICAL CALCULATIONS 

(4) Between what values of the variable will the middle 00% of the 
frequencies lie ? 

There will be 30% on each side of the mean. 

^ = 0-3000, - = 0-8418 

2 CT 

* - 0*8418 x 15 - 12*627. 

Limits are 67*37 and 92*63. 

(5) The probability that a certain value of the variable will be 
exceeded is 5%. What is the value ? 

--- 0-05- i.e ,% = 0-45, ^ - 1 6450. 

2 2 a 

value is 80 + 24*675. 

(6) The probability that a certain deviation from the mean will be 
exceeded is 5%. What is the deviation ? 

(a) If this is read to mean “ exceeded ” m the ordinary sense, 

then = 0*05 and the deviation = } 24*675. 

44 

(b) If the question means a bigger deviation both m the positive 
and negative senses, then for each tail — —— - 0*025, i.e., 

4i 

^ — 0 4750, - 1-96. 

2 a 

.*. the deviation — L (1*96 x 15) = i 29*4. 

(7) What are the quartiles of the distribution ? 

Between the median (— mean) and the third quartile there are 25% 
of the frequencies. 

- - 0*25 

- =■ 0-6745 
a 

* - 0*6745 X 15 =-- 10*1175 
i.e = 80 - 10-1175 

Q 3 = 80 10*1175. 

Example XI. In a distribution, which is exactly normal, 31% of the 
items are under 45 and 8% are over 64. Find the mean and standard 
deviation of the distribution. 

Between ordinates at 45 and x lies 19% of area. 

For £ = 0-1900, - = 0-4968, i.e., X - 45 = 0-4958o. 

2 a 

Between ordinates at x and 64 lies 42% of area. 

Tor ? = 0-4200, - = 1-4053, i.e., 64 - x = l-4053o. 

2 a 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 241 

/. 64 — 45 = 0*4958a + l*40S3a 
19 = 19011a 
10 — a 

x — 45 = 0-4958 X 10 
x =£= 50. 


Some Properties of the Normal Curve 


V = ~A == =er* t / 2 < y, 

<V2w 

dx g 3 V'2k 


d 2 y __ 1 / x 2 . \ 

~ l r 


ar*/2ar* 


( 1 ) 

( 2 ) 


(i) Putting ~ — 0, y has a maximum value of * 


<V27t* 


This 


d 2 y 

is a maximum, since putting x --- 0 in (2) makes ~ negative. 
d 2 y 

(ii) Putting ~ - 0, gives x = ± a, i.e., the points of inflection 


of the curve are at x =. ± a - 

(iii) The “ probable error ”, or quartile deviation , is i 0*6745<r. 

(iv) The deviation is 0-7979a =- 


In the normal distribution a : M.D. : Q.D. = 1 : 0-7979 : 0*6745. 
In distributions of the moderately skew type the three measures of 
dispersion may be expected to be, roughly, in the ratio 1 : 0*8 : 0*67* 

(v) The Moments of the Normal Distribution . Odd moments 
are zero, since the distribution is symmetrical. 

fi 2 — a 2 ; p 4 = 3a 4 

p, = 0 and * p 2 = = 3. 

a* 


In deciding whether, or not, it is appropriate to 4 ‘ fit ” a normal 
curve to a given distribution the values of p x and p 2 are first 
calculated from the distribution. 

If Vp x and p 2 do not differ significantly” from 0 and 3, 
respectively, the given distribution is of the normal type. If V 
is significantly different from 0, the distribution is skew. If p a so 



242 INTRODUCTION TO STATISTICAL CALCULATIONS 

differs from 3, then if p 2 > 3 the curve is more sharply peaked than 
the normal, or leptokurtic, whereas if p 2 < 3 it is flatter topped, or 
platykurtic. The normal curve being taken as the standard, 
— 3 is referred to as the excess of kurtosts 



-3-2-1 O I 2 3 

VALUES OF/t<T 


Fig. 10. 





BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 243 


Determination of Areas under the Normal Curve 

1 ft 

The value of the definite integral A ~ J e~ ix * dx can be found, 

for small values of t, to a reasonable degree of approximation by 
expanding c~* x% and integrating the first few terms. 

Example XII. Find the area of the normal curve over the range 
x — 0 to x 1 (i.e., x = a). 


Arai ” v's/V* “ VtJ'S' - ? ■* ST! - Oi + ■ • >• 


£ , 

6 r 40 336 


-Lr, 

V27T L 

0-3989 (l - I -J-JL 

\ 6 40 

0-3413. 


-T 


t , ^ f ! 


3456Jo 
1 1 \ 
336 4 3456/ 




Fitting a Normal Curve to Observed Data 
Example XIII. Calculate the frequencies of the normal distribution 
which has the same total frequency, mean and standard deviation as 
the following distribution, for the intervals 10-12, 12-14, etc. 
x : 10 12- 14- 16- 18- 20- 22- 24- 26- 

/. 4 30 106 206 272 219 120 37 6 X/= 1000 

x - 19-14; a = 1-445, in class interval units. 


The calculation may be arranged as folloivs : 


(1) 

(2) 

(3). 

w 

(5). 

(6). 

Interval 

X 

Area to 

Proportion 

Col (4) x 

Frequencies 
of the normal 

limit* 

a 

right 

of area 

1000 

distribution. 

- - 00 

~ 00 

1'0000 

0-0068 

6-8 

7 

12- 

-2-4706 

0-9932 

0-0308 

30-8 

31 

14- 

—1-7785 

0-9624 

0-1011 

101-1 

101 

16- 

-1*0865 

0-8613 

0-2079 

207-9 

208 , 

18- 

-0-3945 

0-6534 

0-2704 

270-4 

270 

20- 

+ 0-2976 

0-3830 

0-2218 

221-8 

222 

22- 

+ 0-9896 

0-1612 

0-1149 

114-9 

115 

24- 

+ 1-6817 

0-0463 

0-0375 

37-5 

37 

26- 

+2-3737 

0-0088 

0-0088 

8-8 

9 

~j~ co +oo 

Explanation 

0-0000 


1000-0 

1000 


Col. (1). In the given distribution the frequencies range from 10 to 28, 
but as in the normal distribution the range is infinite — oo replaces 10 
and + oo replaces 28, so that the whole of the normal distribution may 
be represented. 



244 INTRODUCTION TO STATISTICAL CALCULATIONS 


Col. (2). Find the distance, in terms of g, from the mean to the 
lower limit of each interval, working in class interval units. 


Thus 


i 


12 - 19-14 3-57 

a ~ 1*445 

20 - 19-14 _ 0-43 

a + 1-445 


- 2-4706 
+ 0-2976. 


Col. (3). For - = - 2-4706 —— == 0-9932 

a 2 

=• - 1-7785 = 0-9624. 

£ J WJ T —, 

For positive values of area to right is now —~—. 


For X - = -f 0*2976, - -- — 0-3830, and so on. 

g 2 

Col. (4). The entries in this column are obtained by subtraction. 
Thus area from - oo to 12 = 1-0000 - 0-9932 
„ 12 to 14— 0-9932 - 0-9624. 

Col. (5). Finally, the proportional areas of col. (4) are multiplied by 
1000 —i.e., total area—to find the actual area over each interval. 

Col. (6). The required frequencies are then the entries m col. (5) 
written to the nearest whole number. 


Different Forms of the Normal Curve Equation 

Three different measures of dispersion are used by different 
writers in setting down the equation to the normal curve, to the 
added confusion of students. They are: 


The relationships 


a, the standard deviation; 
h, the “ precision ”; 
c, the “ modulus 
between these are 

1 1 


h -- 


aV2 


or a 


hV2 


c = gV 2 or 


a — 


_c 

v'2 


using h, 

y 

h .. s 

VlT 

using c, 

y 

— d—g-Wf*. 
CV TC 


Example XIV. A coin is tossed 100 times. What is the probability 
(i) of exactly 62 heads, (ii) of 62 or more heads, assuming that the coin 
is unbiased ? 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 245 


(i) Mean, or expected, number of heads @= 50 

o =• V -J- x | X 100 =5 

x 12 

= 2*4 Tabular y — 0*0224; area under curve “ 1 , 

o 5 

P(62) = 0*0224 /a = 0*0045. 

(ii) To use Table II to find the probability of 62 or more heads 
it is necessary to regard the variable as continuous; i.e., " 62 ” 
must be taken to cover the interval 6T5—62*5. 


x _ 61*5 - 50 
a 5 


= 2*3; 


a 

o 


0*4893 


= 0-0107 
z 


required probability — 0 * 011 . 

Note that the probability of a deviation of 12 from the mean, i.e,, of 
62 or more heads or of 38 or fewer heads would be 2(0*0107). 

Example XV. Four hundred coins are tossed 1000 times. Hcrvf0 
many times in the 1000 trials would you expect to get (i) 223 or mof# 
heads; (ii) 174 or fewer heads; (in) 231 or more heads, or 169 or fetv&f 
heads ? 


Mean no. of heads per tossing — 200 

a - 10. 


* 222*5 - 

W a =-IF 

times. 


200 


2 25; 


1 - a 


0 * 0122 , i.e., about 12 


(ii) 

(lii) 


X 

a 

x 

cr 


10 


2*55; 



30*5 

"To" 


3*05, 



0*0054, i.e., about 5 times. 
0 * 0011 ; 


total probability is twice in 1000 trials. 

Example XVI. In a certain “ thought-transference ** test a pack of 
cards with 5 different suits is used. The subject has to name the suit 
of a card picked at random and shown to the thought transferer in 
another room. On one occasion 228 correct results were obtained in a 
thousand such trials. Does this indicate that the subject was not 
merely guessing ? 

If he was guessing the chance of a correct result at a single trial was : 
his score was 28 over the mean. 

The probability of exactly 228 correct answers, if the subject is 
guessing, is very small, only about 0*0028, or less than three times in 
1000 trials* 

But this underrates the probability, and what is wanted is his chancy 
of getting 28 or more correct answers over the mea 



246 INTRODUCTION TO STATISTICAL CALCULATIONS 


x 

c 


27-5 

12-65 


2-174 


1 — a 1 

—— 0-0152, i.e., the probability is about 

Example XVII, With the data of Example XVI, find x, if the prob¬ 
ability of x correct answers over the mean is approximately 

j _ a 

This requires that —— — shall be 0-001. 
z 


The corresponding - 
a 


3-09; 


i.e., 


x 

DTes 


3-09 


.'. * = 39-1 

„\ no. of correct answers required is 239. 

Probability of 239 or more is actually 0-0012. 

When n —i.e., the number of trials m a test —is small, the probability 
of the result being due to guessing may be found as m the following 
example. 

Example XVIII. A diviner claims to be able to detect the proximity 
of metal by means of a hazel wand. Ten similar boxes are set out at 
intervals, five of which contain metal and five of which do not. 

The test can be regarded from two points of view. 

(i) In the case of each box the dimmer is required to state 
*' metal " or " not metal How many correct results must he 
achieve to lend support to his claim ? 


(Id l ) 10 

Probability of 10 correct results — 0-001 

„ 9 „ = 0-010 

„ 8 „ = 0-044 

„ 7 „ -0 117 


7 or more correct results could be expected in about 

8 
9 

10 


17% of trials 

rj. o/ 

°2 /o M 
1% „ 

0 * 1 % „ 


Nine correct results by guessing would therefore be unlikely 
and 10 correct results by guessing would be very unlikely, and 
would suggest that the diviner should certainly be given further 
and severer tests. 

(ii) The diviner is asked to pick out the five boxes containing 
metal. 


P(5) = Kyuc, - 1/252 
P (4) = « 25/252 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 247 

obviously he will have to identify all five boxes for hi& claim to 
merit further consideration. 

If the number of boxes had been 20, with 10 containing metal, then 
by the first method we have: 

(i) Probability of 14 or more correct answers out of 20 — 0-059 

15 „ „ „ = 0-022 

„ 16 „ Jf „ = 0*007 

(ii) The reader should calculate the probability of identifying 10 
boxes correctly, 9 or more, 8 or more, as an exercise. 

An alternative method of testing the “ significance ” of such result 
as the above, the test, is discussed in Chapter X. 

IV. Moment-generating Functions 

The following section illustrates the use of moment-generating 
functions * in a simple case. * 

The generating function of probability , where <f>(x) is the probability 
function, is: 

G(tf) = £ f(x)t x , where the variable is discrete 
and = f<f>(x)t x dx, for a continuous variable. 

The moment-generating function is obtained by putting 

t = e a in the generating function of probability 
M(a) = Ii(f)(x)e ax 

or = ff(x)e ax dx. 

fB 

Consider M(a) -- / e ax <f>(x)dx. 

J A 

This integral is a function of a only, a , like i in the g.f. of 
probability, has no meaning; it is merely a mathematical tool. 

Assuming that f(x) is a function for which this integral exists and 
that differentiation term by term is admissible, then 

r B a 2 

M(a) = J <f>(x)dx + a j x<f>(x)dx + ^ J x 2 <f>(x)dx -f . . . 
a 2 a^ 

’ z ~ z v o -f ^2^1 4~ ^3T ~f~ * * * 

, qt 

i.e,, the coefficient of p in this expansion is the rth moment about 
the origin. 

* For a short account of m.g.f. see Mathematical Statistics, C. E, Weather- 
bum (C.U.P.), and Statistical Mathematics , A. C. Aitken (Oliver and Boyd). 



248 INTRODUCTION TO STATISTICAL CALCULATIONS 


If the above expression is differentiated repeatedly, then it will 
be seen that d r M 

Vf ~~ da T *=0 

i.e., M '{a) — M"(«) = v 2 , etc 


The Moments of the Binomial Distribution 

G <‘> - 

— (? + Pt) n 

M(a) = (q f pc a ) n 
M'(a) “ npe a (q + pe a ) n ~ l 

— ftp (9 + p ) w ~ 1 when a - 0 

i.e., v x — np 

W f (a) = npe a (q f pe a ) n 2 (q f npe a ) 

— np(q np) when 0 -=■ 0 

i e , v 2 = npq + n 2 P 2 


Exercise 9 

1 ( 1 ) Calculate the ordinates of the binomial distribution 2 n (£ -f £) B , when 
n — 12 

( 11 ) Calculate the ordinates of the normal cur\e which has the total fre¬ 
quency, mean and standard deviation of the above binomial 

2 Twelve coins are tossed What are the probabilities m a single tossing, 
of getting ( 1 ) 9 or more heads, ( 11 ) less than 3 heads, ( 111 ) at least 8 heads ? 
If the 12 coins are tossed 4096 times, on how many occasions would vou expect 
there to be (zv) less than 3 heads, (\) at least 2 heads, (vi) exactly 6 heads ? 

3 Use Table II to show that the quintiles of a normal curve are - 0 842a, 
— 0 253a, -f 0 253a and + 0 842a, from zero mean, and that the quartiles 
are - 0 674a and + 0 074a 

4 Assuming that one m 80 births is a case of twins, calculate the probability 
of two or more sets of twins in a certain town on a day when 30 births occur 
Compare the results obtained by using (i) the binomial series and (n) Poisson's 
approximation 

5 If the average number of rejects in the manufacture of a certain article 
is 4%, what are the probabilities of 0, 1, 2, 3, 4 rejects m a sample of 10 
articles taken at random ’ 

6 A die is thrown 960 times The expected number of ' sixes is 160 
What is the probability that the number of sixes thrown will be 170 or over, 
assuming the die to be unbiased ? 

7 If the birth sex ratio is 49 girls 51 boys, find the probability of there 
being 8 or more girls amongst 10 babies born on the same day m a maternity 
hospital 

8 A skilled typist, on routine work, kept a record of mistakes made per 
day during 300 working days 

Mistakes per day: 0 1 2 3 4 5 6 

No of days 143 90 42 12 9 3 1 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 249 

Compute the frequencies of the Poisson series which has the same total 
frequency and mean as the above distribution. 

9. Find the first and second moments about zero, v t and v 2 , of the binomial 
distribution, by calculating V(x) and S^ 2 P(^), in the manner of Section 1(1) 
of this Chapter. 

10. In the same way find v x and v 2 , with origin zero, of the Poisson 
distribution. 

11. If out of 100,000 males born, 85,824 survive to reach exact age 25 and 
78,357 to reach exact age 40, what is the probability that, in a random group 
of 20 men aged exactly 25, just 4 will die before they reach the age of 40 ? 

12. The probability that a man aged exactly 50 will die within the year is 
0-01125. What is the probability that of 12 such men at least 11 will reach 
their fifty-first birthday ? 

13. Taking the probability that a man aged 50 years will die within the 
year as fa, find the probability that m a group of 42,000 such persons there 
should be a deviation of 30 or more from the expected number. 

14 Find the average and variance m terms of p, q, n for the binomial 
distribution 

p M = ,1(5^7T!«"' r 

Compute these to the nearest whole number from the following data, and 
hence obtain appropriate values of />, g, n. 


r. 

PM. 

Y 

PM 

r. 

PM. 

2 

0-001 

9 

0-136 

16 

0-016 

3 

0-004 

10 

0-140 

17 

0-008 

4 

0-013 

11 

0-127 

18 

0-004 

5 

0030 

12 

0-103 

19 

0-002 

0 

0-055 

13 

0-075 

20 

0-001 

7 

0-087 

14 

0050 



8 

0117 

15 

0-030 

(L U. B.Sc. 

Econ., 1938) 


15. Sixty-four coins are tossed 1000 times. In how many of these tossings 
would you expect that the number of heads would be (l) 40 or more, (li) 22 or 
less ? 

10. Assuming that the average number of defective articles m a certain 
manufacturing process is 3% find the theoretical frequency of samples of fi|ty 
with 0, 1, 2, 3, . . . defectives. 

17. Certain mass-produced articles, of which 0 5 per cent, are defective, are 
packed in cartons each containing 100 What proportion of cartons are free 
from defective articles, and what proportion contain 2 or more defectives ? 

(R S.S. Certificate, 1947) 

18. The probability distribution of a variable r is given by the formula 
I'M = 16 Cr(i) 16 » where r can take apy of the integral values 0, 1, 2, . . . 15, 16. 

What is the formula <j>(r) of the normal distribution with the same mean 
and variance as P (r) ? %Find the values of P(5) and 

19. A man, who claims to be able to detect mental deficiency from hand- 
writing, is given 12 scripts, 6 of which have been written by mental defectives 
(i) If he states correctly to which category 8 of the 12 belong or (n) if he picks 
out 6 scripts as those written by mental defectives and 5 of them have been so 



250 INTRODUCTION TO STATISTICAL CALCULATIONS 


written, would you consider that either result supported his claim ? If not, 
state, in each case, the minimum result that you would so regard. 

20. In a test of 50 questions, where the subject is required to answer only 
“ Right " or " Wrong " to each question, what is the probability (i) of 30 or 
more, (li) 35 or more correct answers, if the subject is guessing ? 

21. In the test of Question 20, the number of correct answers out of 50 
given on one occasion was such that that number, or more, could be expected 
about once in 100 trials, if the answers were guesses How many correct 
answers were given ? 

22. A normal distribution has mean 5 and standard deviation 3 What is 
the probability that the deviation from the mean of an item taken at random 
will have a negative value ? 

23. Find the probability that an item drawn at random from the distribution 
of Question 22 will be between (l) 2*57 and 4-34; (n) —1-24 and 1-37, (in) 3*74 
and 6*8. 

If the probability of a deviation from the mean greater than x is 0*063, 
what is x ? 

24. The mean I Q of a large number of children of age 14 was 100 and the 
standard deviation of the distribution was 16 Assuming that the distribution 
was exactly normal, find— 

(i) What percentage of the children had an 1 Q (a) under 70, ( b) under 
80, ( c) over 110, (d) over 140. 

(ii) Between what limits the 1 Q s of the middle 34% of the children lay 
(ill) What percentage of the children had IQs within the range 

x i T96c\ 

25 In the distribution of Question 24, state 

(l) The percentage of children with IQs within the range x ± 3*09a 
(n) The range which would include 98% of the children. 

(in) The range which would include 99% of the children. 

26 The mean of a normal distribution is 50 and the standard deviation is 
12-5 Calculate the sixtieth, sixty-fifth, eighty-fifth, ninetieth and ninety- 
fifth percentiles 

27. In the distribution of Question 26 what percentile is about the same 
sigma distance above the seventieth percentile as the ninetieth percentile is 
above the eightieth ? 

28. The following table gives frequencies of occurrence of a variate x 
between certain limits. 

Variate ( x) Frequency 

Less than 40 . . . 30 

40 or more but less than 50 . 33 

50 and more .... 37 


Total .... 100 

The distribution is exactly normal. Find the average and standard 
deviation of x, and hence the frequency between x = 30 and x = 40, and 
between % = 50 and * = 60. (L.U. B.Sc. Econ., 1937) 

29, In a distribution, exactly normal, 7% of the items are under 35 and 
89% are under 63. What are the mean and standard deviation of the 
distribution ? 



BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 251 

30. Calculate the frequencies of the normal distribution which has the same J 
mean, standard deviation and total frequency as the distribution given below, 
for the intervals 60-, 65™, etc. 

x : 60- 65- 70- 75- 80- 85- 90- 95- 

/: 3 21 150 335 326 135 26 4 £/= 1000 

31. Repeat the calculations of Question 30 with the following data ; 

x: 80- 81- 82- 83- 84- 85- 86- 87- 88- 89- 

/: 1 8 35 82 122 124 83 34 10 1 Zf = 500 

32. Show that the points of inflection of the normal curve V =- — 7 L^~P S are 

Vu 

at x — i 1. 

Show also that the proportion of the area over the range x = 0 to x = 0*5 
is 01915. 

33. Show that the fourth moment about the mean for the Poisson distnbu- 

( fl'l m ^ \ 

1 + p + 2\ *+* * • •) is /x 4 = *w(l -f- 3 m). Hence compute the 
fourth momeltt of the following Poisson series . 

i . 0 1 2 3 4 5 6 Total. 

Frequency 54-335 33 145 10 110 2 050 0 315 0 040 0 005 100 

(L U. B Sc. Econ., 1945) 

34. Find also the factorial moments of the distribution of Question 33 by 
direct calculation and convert them to moments about the mean 

35. For the following distribution compute the standard deviation and the 
measure of kurtosis /x 4 /^ 2 2 , applying the appropriate corrections for grouping. 
Comment on the meaning of your result. 

Wages (shillings) : 20- 25- 30- 35- 40- 45- 50- 55- Total. 

Number earning: 1 8 25 35 24 5 1 1 100 

(L.U. B.Sc. Econ , 1946) 

36. A manufacturer sells an article at a fixed price of £1. He guarantees 
to refund the purchase money to any purchaser who finds that the weight of 
ins article is less than 8 oz Actually the weight of the articles made by the 
manufacturer are normally distributed with standard deviation 1 oz His 
<ost of production, £c per article, is related to the mean weight, w oz , of the 
articles by the relation, c = 0-05«i -f 0-30. 

Graphically, or otherwise, determine the mean weight which the manu¬ 
facturer should aim at, if he wishes to maximise his expected profit per article. 

(R.S S Certificate, 1948) 

37 Derive the Poisson distribution Ne~ w ^1 -f m -f ~~ 4- ) as a 

limiting form of the binomial distribution N(£ 4* %) n - Find the mean and 
standard deviation for the table of deaths of women over 85 years old recorded 
in The Times m a three-year period. 

No. of deaths recorded on the day :0 1 234567 

No. of days; 364 376 218 89 33 13 2 1 

Find the expected number of days with “ one death recorded " for the 
Poisson series fitted to the data. (log 10 e = 0-4343.) 

(R.S.S. Certificate, 1947) 



CHAPTER X 


SIGNIFICANCE TESTS 

The study of samples has two objects in view: 

(i) To compare observation with some hypothesis, or 
theoretical expectation, and to discover how far the difference 
between the two can be attributed to chance, i.e., to the fluctua¬ 
tions of random sampling. This is the problem of significance , 

(ii) To use the " statistics ” obtained from the sample as 
estimates of the unknown parameters of the population from 
which the sample is drawn. This is the problem of estimation. 

This chapter is concerned almost wholly with the simpler forms of 
significance test. 

Some*of the significance tests m common use are applicable to 
" large ” samples only; large is a vague term but it may be taken 
that such a sample would have a minimum of 50 items. The other 
tests are applicable to samples of any size, though they are mainly 
used in connection with small samples. The use of the probability 
integral table of the normal distribution—Table II—is proper only 
* with large samples, but as it is the most complete of the tables used 
in significance tests, it is convenient to use it when possible. 

I. Large Samples 

{a) Attributes 

A proportion, fi , of a population has a certain attribute, or charac¬ 
teristic. The expected number of “ successes ” in a simple * sample 
of n items, i.e., the number of items having this attribute, is fin . 
The standard deviation—or stand ard e rror, as it is usually termed— 
/of the number of successes is Vnpq. The standard error of the 
1 proportioH °f successes, fi, in a sample of n items is Vnfiqjn , i.e., 
Vfiq/n. The use of Table II to determine the probability of a 
deviation of % or more successes from the mean, nfi , has been 
illustrated. It should be noted that the use of this Table assumes 
r 

| * Simple sampling is random sampling when p remains the same at each 

i trial. In the following pages a “ random " sample may be taken to mean a 
\ simple " sample unless the contrary is stated. 

252 



SIGNIFICANCE TESTS 


253 


* 


/that the standardized deviate (x — np)/Vnpq is distributed normally, 
or approximately so. 


Levels of Significance 

A coin is tossed n times and turns up heads x times. Does this \ 
result suggest that the coin is biased ? \ 

A “ null hypothesis ” is set up, viz., that the coin is unbiased, 1 
i.e., that thedifference from expectation is due to chance. If » 
(x — np )/V npq — 1*96, the difference could be exceeded by chance, 
with an unbiased coin, only 5 times in 100 such experiments., and 
the difference is said, to be significant at the 0*05, or 5%* level. t 
If (x — np)j\/npq = 2*58, the difference could be exceeded by { 
chance only once in 100 such experiments, and is therefore said 1 
to be significant at the 0*01 level. 

These two levels of significance are commonly adopted; they are, 
of course, purely conventional. Deviations of 2a and 3g correspond 
to levels of 0*0456 and 0*0027 respectively, a deviation of 2a being 
regarded as significant, or probably significant, and one df 3a as 
highly significant. A deviation in tne neighbourhood of 3a indicates, 
in fact, that either (1) a very unlikely event has come off, or (2) that 
the null hypothesis is most likely false. 

It should be noted that the occurrence of a highly improbable \ 
difference does not “ disprove ” the hypothesis in any strict sense, , 
since improbable events do happen. On the other hand, if the 
probability of exceeding a difference is well above 5%, i.e., if the I 
difference is not significant, all that can be said is that the experi¬ 
ment gives no reason to doubt the hypothesis; in no sense does it 1 
“ prove ” the hypothesis. 


Example I. A coin is tossed 900 times and heads appear 490 times. 
Does this result support the hypothesis that the coin is unbiased ? 

On the hypothesis of an unbiased coin: 


(a) s.e. of the number of succes||s is X \ X 900 = 15 
.JO-Mf - 

15 

({>) s.e. of the proportion of successes 




X± 


900 

= o-oi6. *;" 

The difference between the observed and expected proportions is 

it — b ** e *> 9 * 04 . o-04 


0-016 


= 2-67, as before. 



254 INTRODUCTION TO STATISTICAL CALCULATIONS 


i4wr 

i 


The result of the experiment, therefore, does not support th 
hypothesis. 

Example II. An ordinary die is thrown 1800 times. If an ace or 
two turns up 635 times, does this indicate that the true probability c 
an ace or a two with this die is not \ ? 

On the hypothesis that the die is unbiased, the expected number i 
*600. 

The s.e. of the number rr- x f X 1800 

difference , „ 

--]•/«) 

s.e 


■Or s.e. of proportion 


J 


1800 


1 

90' 


The difference between the proportions — 

J35_ ^ 

1800 “ 90 


635 - 600 
1800 


1-75 


35 

1800 


i.e., the experiment provides no reason for doubting the hypothesis. 


Estimation of the proportion , rr, in the population from the proportion 
, p, in a random sample of n items. 

j! When n is large, the assumption that p — 1-96 s.e.^ ^ 7 z ^p + 1*96 s.e. 
[will be in error only 5 times in 100 random samples of size n. 

Similarly, the probability that a sample value of p will be obtained 
such that tz will lie outside the range p ± 2*58 s.c. p is only 1 in 100. 

Example III. A random sample of 900, taken from a large bulk o 
mass-produced screws, has 5% of its items defective. What informatioi 
-can be inferred about the percentage of defectives in the bulk ? 

P = ; s.e., = rk-X-M = 0-0073. 

20 p V 900 

The probability is 0*95 that tz is in the range 0*05 ± 1*96(0*0073) 
i.e., between 0*064 and 0*036. 

Example IV. A biased coing^s thrown 500 times and comes dowi 
heads 300 times. What is the probability of a head m a single throw 
of this coin ? 

p = 0-6; s.e.p =*/—= 0-021*. 

The probability that n is outside the range 0*6 J ; 3(0*022) is less than 
•0*003. 

It should be noted that the maximum s.e.^, in the sample of 500 item c 

“a/W-*' 0 - 00223 ' 



SIGNIFICANCE TESTS 255 

This may be used instead of the actual s.e. of the observed proportion 
if greater certainty that tt is included in a given range is required. 

Example V. Suppose the coin in Example IV had been thrown 2000 
times and the same value of p had been obtained. 

Then s.e. p = ™ ^-Oll, be., i °f i* s value in Example IV; 

that is to sav, the s.e. of the proportion vanes inversely as the square root 
of n. 

The 3a range then becomes 0*6 t 3(0-011) or half its former width. 


Comparison of Two Samples 

standard Error of the Difference between the Proportions 
Let p x and p % be the proportions having attribute x in random 
samples of n x and n 2 individuals, respectively, drawn from different 
populations. 

(i) Assume that the proportions in the two populations are really 
the same, the difference between p x and p 2 being due to the fluctua¬ 
tions of random sampling only. 

Then the best estimate, p , of the#actual proportion in the popu¬ 
lations is 


A _ K Pi + n t ps) 
y " («, 4 « 2 ). 

The s.e. of the proportion in the first sample 


Its 

V 

jp ( l 

_ y 

and the s.e. of the difference, />, — p 2 , - JmI,+ 


second 


Then, conventionally, if 


s.e.diff 


< 2, the difference may be 


regarded as due to random sampling variations, i.e., as not 
significant. 

(ii) Assume that the difference between p x and p 2 is real, i.e., that 
% and tt 2 are actually different. 

Could this real difference be hidden as the result of sampling 
fluctuations in samples of n x and n 2 members respectively ? 


s.e., 


s - e -* = \^r ____ 

and s.e. of difference, (p x — P%)>~ Then if — nCe 

> n x n 2 


s.e.dia. 




256 INTRODUCTION TO STATISTICAL CALCULATIONS 


is less than about 3, the difference may be hidden by sampling 
fluctuations if additional samples of n t and n 2 items are drawn from 
the two populations. 


Example VI. In a large city, A, 20% of a random sample of 900 
schoolboys had a certain slight physical defect. In another large city, 
B, 18-5% of a random sample of 1600 schoolboys had the same defect. 
Is the difference between the proportions significant ? 


(i) 


s.e 



180 + 296 

476 

p = 

2500 

2500 


("476 x 2024 

( 1 

■p “ 

1 2500 2 

\900 


JLU 

1600/J 


: 0-016 


/>, 


s.e. 


0-015 
‘ (H)l6 


The difference, therefore, is not at all significant. 

. ,, 0-16 , _ 0-185 X 0-815 

(S ' e *‘ )S “ 900 ; ^ “ 

~ 0 * 0168 . 


1600 


differences may very well be hidden m similar samples from the 
same material. 


(b) Variables 

If a large number, N, of random samples, each of n items, is drawn 
from a population, which has mean \i and standard deviation a, the 
means of the N samples will form a distribution with mean |x and 

standard deviation 

V n 

This is true whatever the form of the distribution or the size of the 


sample. The importance of the statistic the standard error of 
the mean of samples of size n, depends on the following: 


(i) If x is distributed normally with mean and standard 
deviation a and random samples of size n are drawn, then the 
sample mean, x } will be distributed normally with mean p and 


standard deviation 


i.e., 


Vn 

~-is a normal deviate. 


(ii) If x is not normally distributed, so long as its frequency 



SIGNIFICANCE TESTS 267 

curve is unimodal, only moderately skew and tails off to zero on 
both sides of the mode, ——v W n } ias a distribution 'which 

<5 

approaches the normal distribution as n becomes increasingly 
la;rge; thus even when the variable x has a distribution which 
differs considerably from the normal, the sampling distribution 
• v of x approximates closely enough to the normal to make the use 

of Table II to find the probability of the deviate ~ * 

permissible. 


Example VII. The mean I.Q. of a sample of 1600 children was 99. 
ls it likely that this was a random sample from a population with mean 
.Q. 100 and standard deviation 15? 


s.e. 


-iiLc = 0375 
V1600 


* - 
0-375 


2-667. 


The probability of such a negative difference is 0*0039. 

,, ,, an absolute „ 0-0078. 

It is therefore unlikely. 

Example VIII. A sample, size 400, has x = 6-0". Can it be regarded 
is a simple sample from a large population with mean 6*2" and standard 
ieviation 2-25 // ? 


s - e -raoan — 


2-25 

20 


0-1125 


0-2 

0-1125 


1-78, 


t—- = 0-0375. 
2 > 


The probability of such an absolute difference is 0-0750. 

Therefore the sample might be so regarded. 

Example IX. The standard deviation of a large number, N* of 
observations on height is 2-67". If random samples of 64 individuals 
ire drawn from the “ population ” of N individuals, what is the proba- 
>ility that the sample mean will differ from the true mean by or 
nore ? 


2*67 

s.e <mean - _ = 0-334 

Difference 0-75 A 1 — a 

- = A ~ n T A ~= 2-2455 ; * —r— 

s.e. 0*334 2 

Probability » 0*0246 ^ -fa. 


0-0123 


2 



258 INTRODUCTION TO STATISTICAL CALCULATIONS 


fiducial Limits of an Unknown Population Mean 
Let s be the standard deviation of a random sample of n items, 
whose mean is x. The standard deviation of the population is a 
but is unknown. 


s 

Using s as an estimate for a, s.e. of sample mean = -7=. v 

V n \ 

Then the assumption that the population mean (i lies between 

x — 1-96 and x + 1*96 
Vn 

similar samples. 


s 

— 7 = will be true in the case of 95% oft 
Vn | 


* 


s 

The values x ± 1*96 are termed the 95% fiducial limits of (x, 
V n 

the mean of the population from which the sample was drawn. 

x d: 2*33 4= and x 
Vn 

limits of (x, respectively. 


d: 2*58 ~~ are the 98% and 99% fiducial 
Vn 


Example X. A random sample of 625 items from a normal population 
of unknown mean has x ~ 10 and s.d. = 1*5. What are (a) the 99*5% 
and (6) the 99*8% fiducial limits of [x? 


(*) 

-(*) 


s.e rae an- 7= -0*06 

99*5% F.L. = 10 ± 2*81(0*06). 
99*8% F.L. -= 10 ± 3*09(0*06). 


Standard Error of the Difference between the Means of Two Samples 

(i) Two independent random samples, with n x and n 2 members, 
respectively, are drawn from the same population, of standard 
deviation a. 

The standard error of the difference between the sample means 

=J7( TTI) 

V \n x n 2 I 

If -or is unknown, s for the combined samples must be substituted. 

(ii) Two random samples, with x v s v n x and x 2 , s 2 , n 2 , respectively, 
are drawn from different populations. 

The s.e. of the difference between the means 




SIGNIFICANCE TESTS 


or where and a 2 are unknown 


Example XI. Random samples of 500 and 400 have means 11*5 and 
10*9, respectively. Can the samples be regarded as drawn from the 
same population of standard deviation 5 ? 

s.e. difl . = = °' 335 - 

Difference 0*6 ,_ 


The difference is therefore not significant, i.e., there is no reason to 
doubt the hypothesis. 

Example XII. Bricks of the same type, produced at two different 
works, were tested for transverse strength, with the following results. 

Works I. Works IT. 

No in random sample .... 300 270 

Mean strength in lb. per sq in. . . 990 1000 

Standard deviation, lb per sq. in . 240 202 

Is the difference between the means significant ? 


V "3i> 


57600 , 40804 
300 + 270 


— = 0*54 difference is not significant. 
lo*5 


Standard Error of the Standard Deviation 

The standard error of the standard deviation is —%=- in the case 

V2n 

of large random samples from a normal population. 

It should be noted that if the distribution of the parent population 
is not normal the use of this formula may lead to serious error. 

Standard Error of the Difference between Two Standard Deviations 
In the case of two large random samples, each drawn from a 
normally distributed population, the s.e. of the difference between 
s t and s a is 




260 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example XIII. Is the difference between the standard deviations in 
Example XII significant ? 

s.e. = 0-707 of s.e. {x x - x 2 ) = 13-1 
240 - 202 _ 0 

"nT~~ “ 3 

difference is significant. 

Example XII indicates that there is no reason to suppose—on the 
evidence of this test—that the bricks from the two works differ in 
average transverse strength. On the other hand, there is evidence 
in Example XIII that the product of Works I shows greater 
variability than the product of Works II. 

Standard Error of a Correlation Coefficient 

l — r 2 

This is commonly given as s.e. r = —. 

Vn 

It is important to note that this form of the s.e. of r can be used 
only when n is large and r is moderate or small—say, less than 0-5. 
The sampling distribution of r is not even approximately normal 
when r is large. The Mest of the significance of r , which can be 
used in all cases, is given later in this chapter. 


Example XIV. A random sample of 1600 pairs shows a correlation 
coefficient of 0*40. Estimate the limits to the correlation in the 
population. 


1 - 0-16 


s.e. r = 


40 


0 - 021 . 


The limits may be taken as r ± 3 (s.e.), i.e., 0-463 and 0-337. 

Example XV. A correlation coeff. of 0-15 is derived from a random 
sample of 900 pairs of observations. Is this value of r significant ? 

The point at issue is: supposing the variables, x and y, in the popula¬ 
tion, to be uncorrelated, is it possible for the variations of random 
sampling to give rise to an r of this size in a sample of 900 pairs ? 

Adopting the hypothesis that the variables are uncorrelated 


s.e. = 


1 

V900 


=» 0*033 ; 


0-15 

0-033 


4-5 


,\ p, the coeff. of correlation between the variables in the population, 
<j 0 uld not be zero, i.e., r =» 0-15 is a significant value. 



SIGNIFICANCE TESTS 


261 


Note that if the sample had been one of 100 pairs only then 

s.e. = 01 


0*15 

0-1 


1-5 


i.e., v in other similar samples of 100 pairs might well be zero. 


Size of Samples 

Example XVI. The standard deviation of a population is 5". What 
size of random sample must be drawn from this population if the 
probability that the sample mean will not differ from the population 
mean by more than 1" is required to be 95% ? 


Let 

then 


n = no. in sample 




inches. 

V n 


and 


1*96 X -~= = 1 inch, 
v n 

n = (9*8) 2 
= 96 or 97. 


If n = 100 the 95% F.L. of pt would be x i 1-96 (0-5). 


II. The x 2 (or Chi-squared) Test 

The x 2 test of significance is very widely used in statistical work. 


It is applicable both to large and to small samples. 
The statistic x 2 may be defined as 

x2==s {(i^)- 


u 


(i) 


where 


/ 0 = an observed class frequency 
f € = the theoretical ,, „ 

and the summation extends over all the classes in the data. 
Formula (1) may be written: 

-W/o 2 ' 


where 


X ‘ = 2(£)-N. . 

N = the total frequency. 


( 2 ) 


The X 2 „Jafel f_KM g entered at the ap- 
propriate value ..of v, or number of " degrees of freedom 1 , i.e., the 
number of classes to which values can be assigned arbitrarily. 

* P indicates various levels of significance. A figure in the body of 



262 INTRODUCTION TO STATISTICAL CALCULATIONS 

/ the table gives the value of x 2 which is significant, for v degrees of 
freedom, at the stated level of significance. 

1 Thus for 10 d.f., x 2 ~ 18-31 is significant at the 0-05 level. P, 0-05 
in this case, is the proportion of the area under the distnBution 
curve of wi th 10 dX to the right of the ordi nate a t v 2 ^ 18-31. 
P7 thfen-r gi ves’Tte^probaljilit y of exceeding ^pKvalue of the 
statistic in random sampling. 

| If P is small, the value of x 2 is regarded as significantly large, i.e., 
’ as indicating that the differences between observation and expecta¬ 
tion are not likely to be due to chance. The 0-05 and 0*01 levels of 
'significance are commonly adopted. 

1 Very large values of P are also significant, since if P is very large 
piere is a very low probability of obtaining a smaller value of x 2 * 
Ve^y large values of P in fact suggest that the agreement between 
observation and theory is too close to be credible. 

The technique of the x 2 test is the same as that which has been 
described in connection with the normal curve, in so far as both 
depend on determining the area of a portion of a continuous distri¬ 
bution curve and relating this to the probability of an event. But 
there are important differences between the two distributions. 

} The horizontal axis of the normal curve extends from — oo to 
+ 00 , and the curve is symmetrical. It is also unique, showing the 

* distribution of any normal deviate, and in consequence the prob- 
I ability integral of the normal curve can be tabulated as fully as is 
5 necessary in compact form. 

The horizontal axis of the distribution of x 2 extends from 0 to 
+ oo, since negative values cannot occur. The shape of the distri¬ 
bution curve varies according to the number of degrees of freedom 
I involved. When v = 1 the curve becomes a normal curve with 
v standard deviation 1. When v is greater than 1 the curve is uni- 
[ modal with marked positive skewness for low values of v. As v 
increased the curve tends to symmetry. When v is greater than 30 
the probability of a value of x 2 can be obtained from Table II, and 
consequence tables of x 2 are not taken beyond v = 30. 
f Since for each value of v, from 1 to 30, the distribution curve has a 
I different shape, thirty different probability integral tables would be 

* needed if the x 2 integral were to be fully tabulated. This would be 
' impossibly cumbersome, and for this reason, and still more because 

it satisfies all ordinary needs, the x 2 table gives only for each value 
1 U(f r v the values of x 2 corresponding to a few significance levels. The 
I jpitobabilities of other values of x 2 can if necessary be obtained 



SIGNIFICANCE TESTS 


263 


• approximately by interpolation. In general, however, all that is 
' required is to find out if a x 2 obtained from the data is, or'is not, 
significant at the 0-06 or 0*01 level. 



Fig. 11. 

The proportion of the area of the curve to the right of the ordmate at 
X a a® 10 is the probability, tabular P, of exceeding a value of 10 m random 
sampling. 

Example XVII. A coin is tossed 100 times. What is the probability 
of 62 or more heads if the coin is unbiased? (See Example XIV, 
Chapter IX.) 



Heads. 

Tails. 

Observed . 

. 62 

38 

Expected . 

. 60 

50 

(/o — ft) difference 

. 12 

12 

<*•//..... 

2*88 

2*88 


There is only 1 d.f. here, since if the number of heads is known the 
number of tails is also known. 

Using a detailed table of x a lor 1 d.f. we find 
for x 2 * 5*70, P = 0-0164. 

But this is the chance of getting either 62 or more heads or 62 or more 
tails. 

/. Probability of 62 or more heads — — 0*0082. 



264 INTRODUCTION TO STATISTICAL CALCULATIONS 

In Example XIV we found the probability to be 0*0107* The x 2 
test, as applied above, exaggerates the significance of the result of 
the experiment. The discrepancy is accounted for as follows: 

The x 2 test assumes that the variable is distributed normally 
and that the variation is consequently continuous. Sixty-two must 
therefore be interpreted as the interval 61*5-62*5 and 38 as the 
interval 37*5-38*5. We then have each difference diminished by 
0*5, with the following result: 

d = (/ 0 -/,) : 11-5 11-5 

2-645 2-645 * 2 = 5-29 

For v = 1 , x 2 = 5*29, the extended table gives P = 0*0214 and 
£P = 0*0107, as in Example XIV. 

This correction, which is known as the Yates 9 Correction for 
Continuity , should always be made when v = 1 and N is small. 
Even when N is not conventionally small, the application of the 
correction will ensure that the probability of a difference from 
expectation thus determined will agree more closely with that 
obtained by the direct, and correct, binomial method. 

It should be noted that the Yates’ Correction should not be made 
when v is greater than 1. 


X 2 and the Normal Distribution 

When there is only one d.f. x 2 is the square of a normal deviate 
with zero mean and standard deviation unity. Thus the values of 
X 2 in the top line of Table III are the squares of normal deviates at 
the stated levels of significance, e.g., x 2 = 3*84 is significant at 

the 0*05 level, i.e., P = 0*05 and a normal deviate - = VS *84, i.e., 

<7 

1*96, is significant at that level. 

When v = 1 , therefore, the probability of any value of x 2 can be 
found by the use of Table II. 


Example XVIII. Two hundred and twenty-eight correct answers out 
of 1000 were given when the “ expected ” number was 200. (See 
Example XVI, Chapter IX.) 


or 



Right, 

Wrong 

Observed 

228 

772 

Expected 

200 

800 

Difference 

28 

28 

d*if. . . 

. 3-92 

0-98 

x 

n rt i a n 

1 - a 



0*0135. 


X® = 4*90 
JP =r 0*0134 


2 



SIGNIFICANCE TESTS 


It the correction is applied 

d : 27-5 27*5 X* = 4*726 

d*lfe: 3*781 0*945 JP = 0*015 

- aa V4*726, - = 0*015, which agrees with the result of 

or Z 

Example XVI. 

2x2 Contingency Tables 

Example XIX. The attributes R and Q, e.g., red-hair and quick* 
temper, are popularly supposed to be associated. Do the following 
data support that opinion ? 




! Attribute R. | 




R. 

Not R. 

Totals. 

Attribute 

Q 

35 

135 

170 

Q 

Not Q 

' 115 

! 

715 

830 

Totals : 

150 

850 

1000 


Interpretation of the Table 

Out of a random sample of 1000 individuals 170 have attribute Q, 
150 attribute R, and 35 both attributes. 

If R and Q tend to be associated, then the proportion of the Q’s 
amongst R’s will be greater than the proportion of Q’s amongst the 
not R’s. 

If, on the other hand, there is no association, then the expected 
number of R’s in the above sample who are also Q's, i.e. the " inde¬ 
pendence value " of the (RQ) cell, would be: 


rm x - 2S ' 5 - 


Since the marginal totals are fixed, the three remaining cells can be 
filled in by inspection; i.e., in a 2 x 2 table there is only 1 degree * of 
freedom. 


Expected Frequencies 



R. * 

Not R. 

Q * 

25*5 

144*5 

Not Q . 

. 124*5 

705*5 

U 

/•■ (/o 

-mu 

35 

25*5 

3*53 

115 

124*5 

0*72 

135 

144*5 

0*62 

715 

705*5 

0*13 



5*00 x* 



266 INTRODUCTION TO STATISTICAL CALCULATIONS 

For I di., from the table, for x 2 = 6-64, P == 0*01 

„ x 2 = 3*84, P = 0*05 

X 2 * 5*00 is significant at about the 2*5% level, i.e., such a value 
could occur about once in forty such samples through the variations of 
random sampling, and the “ evidence " so far as it goes supports the 
view that red-haired people tend to be quick-tempered. 

When applying the x 2 test there is no need to determine P with any 
great accuracy. All that is required is to find out if the value of x a 
calculated is significant at some pre-determined level. If the Yates’ 
Correction be applied in the above example, P becomes 0*033. 

If N is small, the correction should be applied. In a 2 X 2 table it 
consists in adding \ to the observations in those cells which are below 
expectation and subtracting \ from the observations in the cells which 
are above expectation. 

Example XX. Do the following data suggest any association between 
attributes A and B ? 



Attribute A. 

A. Not A. 

Totals. 

Attribute B 

10 

8 

18 

B. Not B 

6 

26 

32 

Totals : 

16 

34 

60 

Observed fre¬ 

Expected 



quency (corrected). 

frequency. 


d*IU 

9* 

5-76 


2-43 

6i 

10-24 


1-37 

H 

12-24 


114 

25J 

21-76 


0-64 



X 2 = 

= 5-58 


This value of x a is significant at the 0*02 level, so that association is 
certainly suggested. 

The value of x 2 obtained from the data, if the correction is not made, 
is 7*17 and P = 0*0074, which greatly exaggerates the significance. 

In a 2 x 2 table where the cell frequencies and marginal totals are 
as below 


a 

b 

(a 4* b) 

c 

d 

(c -f d) 

(a + c) 

(b + d) 




SIGNIFICANCE TESTS 


where N is total frequency and ad the larger cross-product 


(i) X 2 


(ad - fe)«N 


(a + c)[b + d)(c + d)(a + b)’ 
(ad -be- jN) 2 N 


or with Yates* Correction 


{a + c)(b 4 ~\- d)[c + d)(a + b) 

(ii) The variance of the number in any one cell is 
(# + c)( b + d)(c + d){a + b) 
W 


var . 


Example XXI. Is there any significant association between D and B ? 



D. 

Not D 

Totals. 

B 

18 

162 

180 

Not B 

4 

96 

100 

Totals * 

22 

258 

280 


Observed 

Independence 


frequencies 

values. 

d*lU 

18 

14-143 

1-05 

162 

165-857 

0-09 

4 

7-857 

1-89 

96 

92-143 

0-16 

319 - x 


P — 0-0736. Therefore test gives support to the hypothesis of no 
association. 


(b) 


2 _ (18 X 96 - 162 X 4) 2 280 
‘ 22 x 258 X 100 X 180 


3*19. 


(c) 


2 __ (22 x 258 X 100 X 180) 
° “ 280 3 
c = 2-157. 


4-654 


The deviation from expectation — 18 — 14T43 = 3-857 


3-857 

2-157 


1-788, 


t —- = 0-0368 
z 


Sum of tails = 0-0736 as before. 


Example XXII. Children having one parent of blood-type M and the 
other of blood-type N will always be one of three types, M, MN, N, and 
average proportions of these will be 1 : 2 : 1. 

A report states: "Of 162 children having one M parent and one N 
parent, 28*4% were found to be of type M, 42-0% of type MN and the 
remainder of type N, The low value of y 2 demonstrates the truth of 
the genetic theory.” 




268 INTRODUCTION TO STATISTICAL CALCULATIONS 


Calculate the value of x 2 , make the appropriate test of significance 
and comment on the conclusion. (R.S.S, Certificate 1947) 


[a) Calculate the value of yf, using the percentage frequencies . 



% observed. 

% expected. 

d. 

d'lf,. 

M 

28-4 

25 

3*4 

0-4624 

MN 

42-0 

50 

8*0 

1-2800 

N 

29-6 

25 

4*6 

0-8464 


100*0 

100 


2*5888 

(5) Calculate the value of x 2 

using the actual frequencies. 

M 

46 

40*5 

5*6 

0*747 

MN 

68 

81 

13 

2*086 

N 

48 

40*5 

7*5 

1*389 


162 

162 


4*222 . 


The above calculations illustrate the fact that if x 2 is calculated from 
the percentage frequencies, the value of x 2 thus obtained must be 
N 

multiplied by in order to obtain the true value, i.e., from (a), 
X s - 2-59 X JJ? = 4-20. 

In the above table v — 2. Table III shows that for 2 d.f. a x 2 = 5*99 
is significant at the 0*05 level. Therefore the data afford no evidence 
against the hypothesis that the " true ” proportions are 1:2:1. 

The h X h Contingency Table 

Example XXIII. Relationship between number of wage earners and 
output per manshift at coal mines employing 100 or more wage-earners 
in Great Britain in 1945. 


Size of mine. 

No. of mines with an output per man¬ 
shift of: 

Total 

mines. 

No. of wage- 
earners. 

Under 

15 cwt. 

1 

15 cwt. 
and under 
20 cwt. 

1 

20 cwt. 
and under 
25 cwt. 

25 cwt. i 
and over. 

100-499 

103 

140 

76 

42 

361 

500-999 

58 

131 

76 

39 

304 

1000 and over 

25 

73 

83 

48 

229 

Total mines: 

186 

344 

235 

129 

894 


[Ministry of Fuel & Power Statistical Digest, 1945] 



SIGNIFICANCE TESTS 


269 


Compute a measure which will indicate the relationship referred to in 
the heading of this table. 


(L.U. B.Sc. Econ., 1947) 


Expected Frequencies 
(Nearest whole number) 


75 

139 

95 

52 

361 

63 

117 

80 

44 

304 

48 

88 

60 

33 

229 

186 

344 

235 

129 

894 


X 2 - 48-3 ; v = (4 - 1)(3 - 1) = 0 
Since for v = 6, x 2 = 16-81 is significant at the 0*01 level 
and x 2 = 22-46 „ „ 0*001 „ 

there can be no doubt that there is association between the number of 
wage-earners employed in a mine and the output per manshift in that 
mine. 

One measure of the relationship is the 

Coefficient of Mean Square Contingency, C. 

C= V X 2 /(N + ?) 

= 0*72. 


In calculating x 2 f rom a contingency table the use of the 

fl 

u 

the other hand, the use of the formula 2 j has the advantage 

of disclosing the contribution to x 2 made by each class, or cell, thus 
drawing attention to the more significant classes. It is also an 
advantage to have the expected cell frequencies. The x 2 test 
indicates whether or not there is association; it does not give any 
information about the nature of the association. It is much easier 
to determine this if, the expected frequency is placed, in brackets, 
beside the observed frequency in each cell. 

Size of Cell Frequencies 

In the application of the x 2 test it is desirable that no theoretical 
class frequency shall be less than 10, and certainly not less than 6. 
If small theoretical frequencies occur, it is generally possible to 
overcome the difficulty by grouping two or more classes together* 
This procedure is illustrated in the next example. 


alternative formula, x 2 = 2^ 


j — N, involves less arithmetic. On 



270 INTRODUCTION TO STATISTICAL CALCULATIONS 

The x 2 Test of “ Goodness of Fit ” * 

Example XXIV. The following table gives the observed frequencies 
of a distribution, and the frequencies of the normal distribution having 
the same values of x, N, and or. Apply the x a test of goodness of fit. 


31 

111 

217 

270 

223 no 

29 

5 


31 

107 

216 

280 

216 107 

31 

6 


/o- 


u 






4 

31 


n 

3lf 


0*1081 


(37- 

- 35) 2 /37 

111 


107 


0*1495 




217 


216 


0*0046 




270 


280 


0*3571 




223 


216 


0*2268 




110 


107 


0 0841 




29 

5 


31) 

of 


0*2432 


(37 - 

- 34) 2 /37 





- M 734 





Without grouping the value of x 2 obtained is 1*7845. The difference 
happens to be immaterial in this case. If, however, the first and last 
frequencies had, by chance, been 1 and 3, respectively, then the first and 
last classes, if ungrouped, would by themselves have contributed 5*67 
to x 2 - Failure to group may, in short, give a quite unjustified 
significance to the value of x 2 obtained. 

The number of degrees of freedom is I for each class less 1 for each 
" restraint The 9 original classes have been reduced to 7 by grouping, 
thus reducing the d.f. by 2. In addition, the mean, standard deviation 
and total frequency of the original distribution have been used in 
calculating the theoretical frequencies, thus introducing three restraints. 
The number of d.f. is accordingly 4. 

For d.f. = 4, a x 2 = 9*49 is significant at the 0-05 level; a x 2 greater 
than 3*36 has a probability of 0-5. The fit is therefore good. 

The Significance of x 2 when v is Greater than 30 

When v is greater than 3 0, V2y * is distributed approximately 
normally about the m ean V2v — 1, with unit standard deviation; 
i.e., V 2y_ 2 — V2v — 1 may be used as a normal deviate. 

Example XXV. Find P, when x t = 60 and v = 36. 

VT20 = 10-964, V72 - 1 = 8-426 

- = 10-964 - 8-426 = 2-53 
o 

P = 0-0114. 



SIGNIFICANCE TESTS 271 

DDL To Test whether a Distribution Differs Significantly from 
the Normal Type. (Alternative Method.) 

Standard Errors of y 1 and y 2 

Yi = v'fv yz-h- 3 . 

In the normal distribution 

Vi = 0, y 2 = 0. 

The test, therefore, is to see if y x and y 2 of the given distribution 
differ significantly from zero. 


s.e., 


yi 


71 = s - e -v2 -= • ^ere N = 2/. 


Example XXVI. In a distribution of 1000 items y x is found to be 0*011 
and y 2 0-25. 


s.e 


yl 


_ p. 

V lot 


6 _ 0*011 

— 0-078; - 

1000 ' 0*018 


s.e. 


24 

y2 "" V 1000 s.e. 0*156 

distribution does not differ significantly from normal. 


0*156; 


diff. 

s.e. 


0*14, which is quite insignificant. 
0-25 


1 * 6 , 


which is not 
significant. 


IV. Small Samples 

The significance tests illustrated in Section I of this chapter are 
applicable to large samples only. The following tests are applicable 
to samples of any size. 

(a) The ^-Distribution 

In the case of random samples of size n from a normal population 
of mean and standard deviation a, the statistic --— — is 

d , 

distributed normally. In general the value of the constant <r is 
unknown, and the variable estimate s, derived from the sample, has 

to be used. The statistic t 


(x - \i)V t n . 


is not distributed normally. 


Estimates of Population Variance 
The estimate of the population mean, is always % = —, but the 
estimate of the population variance, in small samples, must be 
i) instead of - , which is permissible in the case of 

large samples. 



272 INTRODUCTION TO STATISTICAL CALCULATIONS 

If a # n, then S(* — x)' 1 , since it is a minimum, must be less than 
2 /# _ 2 

— fi) 2 . Consequently - - - - -—— underestimates the population 
variance. 

_ #\2 

It c an be shown that -- - gives an “ unbiased " estimate 

n — 1 

of fjt, ,i* e * the mean value of s 2 over a large number of random 
samples tends to a 2 and becomes a 2 when the number of samples is 
“ infinitely ” large. 

It should be clearly understood that the correct divisor of the 
“ sum of squares ” is ( n — 1) in all cases. When n is large, however, 
the difference in the quotient is negligible. 

With frequency n, there are n such terms as ( X{ — x ); but since 
L(x — x) = 0, there are only [n — 1) independent terms for 

'Lx 

calculating the variance. By making x = — one restraint has been 

imposed and one degree of freedom lost. 

In calculating the variance not n but v, the number of degrees of 
freedom, is the appropriate divisor. 

The statistic t = - ——where s 2 = 5-^-—is based on 

5 n — 1 

(n — I) degrees of freedom. 

When v is small the distribution of t is far from normal. When v 
is infinite the two distributions are identical and for values of v over 
60 the differences are small. 

The t-Table 

The stable—see Table IV, page 348—is the probability integral of 
the /-distribution. It gives, over a range of values of v, the proba¬ 
bilities of exceeding by chance values of t at different levels of 
significance. 

The table is entered at the appropriate value of v; the tabular 
entries are values of /, the probabilities of exceeding which by chance 
are given by the P's at the heads of the columns. 

For v = 7, e.g., the probability of exceeding t = 2-37 is 0-05. P is 
the proportion of the area under the two tails of the /-distribution 
curve; i.e., it is the probability that the tabular value of / will be 
exceeded numerically. 

If / be regarded as a normal variate in the above case, - » 2*37 

CT 

corresponds to a proportion of area under the two tails of the normal 



SIGNIFICANCE TESTS _ ‘278 

distribution of only 0-018. The use of the normal table, for small 
values of n, thus greatly exaggerates the significance of a result. 

The t-Distribution Curve 
The distribution curve of the statistic t is 


m = 


y<> 






where v = d.f. and y 0 is such that the total area under the curve is 
unity. 

The t curve has a mode, coinciding with the mean, at t = 0. It is 
symmetrical and, like the normal curve, extends to infinity on either 
side of the mean. 


When 


v — y oo 


JVo 



i(v+l> 




i.e., the distribution of t becomes normal. 

When v is small the spread of the t curve is notably greater than 
that of the normal curve. 

Whereas in the normal curve 5 % and 1 % of the total area is 
outside the limits 0 ± 1*96 and 0 ± 2*58 respectively, the corre¬ 
sponding values of t, when v = 5, are ± 2*57 and ± 4*03 respectively* 
When v = 120 , these limits become ± 1*98 and ± 2 * 62 . 





274' INTRODUCTION TO STATISTICAL CALCULATIONS 


t- Tests of Significance 

It should be noted that in making these tests it is assumed that 
the distribution of the population is normal, or nearly so. - 


(i) Test of an Assumed Population Mean 
Example XXVII. The nine items of a sample had the following values: 
45, 47, 50, 52, 48, 47, 49, 53, 51. 


Does the mean of the nine items differ significantly from an assumed 
population mean of 47-5 ? 


x = 49j 


2 (*-*)* 
5 “ ~9^~r 


5 - 2-62 


(49*11 - 47*5) V 9 
2-62 


1*84. 


For v ~ 8, t = T86 is significant at the 0-10 level. 

The difference, therefore, is not at all significant, and the sample 
may well be a random sample from a normal population of mean 47*5. 

Example XXVIII. A sample of 20 items has mean 42 units and standard 
deviation 5 units. Test the assumption that it is a random sample from 
a normal population with mean 45 units. 

Find the 95% and the 98% fiducial limits of (i from the data of the 


(45 - 42) V20 


2-683. 


For v — 19, / (002) — 2*54, where / (0 02) is the value of t , for 19 d.f., which 
is significant at the 0*02 level. 

The difference, therefore, is significant at the 0*02 level; i.e., the 
hypothesis is discredited. 


Fiducial Limits of a Population Mean 

Assuming that the sample is a random sample from a normal 
population of unknown mean the 95% fiducial limits of p are: 

42 ± ~t {00i) = 42 ± X 2-09 = 42 ± 2-34. 

The 98% limits are: 

42 ± ^/^<o-o2) 42 ^ 2-84 

i.e., the limits are 44-84 and 39T6, and it will be noted that the 
assumed mean of 45 units is above the upper limit. 



SIGNIFICANCE TESTS 


275 


Example XXIX. A coin is thrown 12 times and the number of heads 
noted. The experiment is made 12 times. The numbers of heads 
thrown are 2, 2, 3, 3, 3, 4, 4, 5, 6, 7, 8, 10. 

Is it likely that the coin is biased ? 

£ = 4-75 (x — expected mean — 6 


s 2 



70-25 

"IT 


6*3863 


s = 2-527 

(6 - 4*75) V12 
2327 


1*71. 


For v — 11, ^ (0 io) — 130. Thus there is no reason to suppose the coin 
is biased. 


(ii) The " Difference ” Test 

Example XXX. A group of 10 children were tested to find out how 
many digits they could repeat from memory after hearing them once. 
They were given practice at this test during the next week and were 
then retested. Is the difference between the performance of the ten 
children at the two tests significant ? 

Child: A B. C D. E. F. G. H. I. J. 

Test 1.65478 6 7568 

,, 2:776796866 10 


The point at issue is: did the practice have any effect ? If it did not, 
the expected mean difference in the results is zero. 

Child. Test 1. Test 2. d d *. 

A 6 7 1 1 

B 5 7 2 4 _ d = 1 

C 4 6 2 4 Z(d ~ d) z = 16 - 10 a /10 = 6. 

D 7 7 0 0 s 2 = | = 0-6 

E 8 0 1 1 s = 0-8165 

F 6 6 0 0 dVn 

G 7 8 1 1 

H 5 6 1 1 

1 6 6 0 0 

J 8 10 2 4 

10 16 


t -= 1 V10/0-8165 = 3-87 

For v = 9, ^o.oi) ^ ^35. 

t = 3-87 is highly significant and the inference is that the practice 
effected an improvement in performance at this test, 

(iii) Test of the Difference between the Means of Two Samples 
Two independent samples of n x and n 2 items, and with means of 
and x 2 , are given. 



276 INTRODUCTION TO STATISTICAL CALCULATIONS 


May the two samples have been drawn at random from the same 
normal population ? 

Estimate of the population variance, based on both samples 
__ — ^ i ) 2 4 ~ ^(#2 ^ 2 ) 2 s 2 

n i + — 2 



t = 


“ 7171 

V % " 1 ~ n 2 


(fi — # 2 ) / 


\/ ^ 


Note that t is based on (n x + n 2 — 2) d.f. 

The t of this formula has the same distribution as that of t in the 

.. . (x — iWn 

equation „ t = 1 —, 

Example XXXI. Two independent samples of 8 and 7 items respec¬ 
tively had the following values of the variable (weight in ounces). 


Sample 1: 9 11 13 11 15 9 12 14 

Sample 2. 10 12 10 14 9 8 10 


Is the difference between the means of the samples significant ? 


*1- 

v. 

*2 

9 

81 

10 

11 

121 

12 

13 

169 

10 

11 

121 

14 

15 

225 

9 

9 

81 

8 

12 

144 

10 

14 

196 

— 

94 

1138 

73 


V- 

100 

144 

100 

196 

81 

64 

100 

785 

(11-75 - 10-43) 
21 


X x = 11-75 oz. 
x 2 = 10-43 

Z(x x - ^) 2 - 1138 


X(* a 


9£* 
8 

702 

*.)* - 785 - 
2 _ 33-5 + 23-7 


8 + 7-2 
5 - 2-10. 


33-5. 
= 23-7 
4-4. 


J 


— = 1 - 21 . 
15 


Forv = 13, f (0 . 06) = 2-16. The difference is not significant; i.e., there 
is no evidence against the hypothesis that the samples were drawn from 
the same population. 

The 95% fiducial limits of the difference between population means . 

These limits are (x x — x 2 ) ± t (0 . m $ K + — 

n x n 2 

i.e., 1-32 ± 2-16 x 2-1 + } 

Le., from — 1-03 to 3-67. 



SIGNIFICANCE TESTS 


277 


If zero lies outside this range of ([i x — |x 2 ) it is concluded that the 
difference between the sample means is significant of a difference 
between the means of the populations. 

Suppose that the data of Example XXX had been as follows: 

Example XXXII. Two independent groups, each of 10 children, were 
tested to find out how many digits they could repeat from memory after 
having heard them read out once. The results were as follows: 

Group A:6 5 4 7 8 6 7 5 6 8 

Group B: 776796866 10 

Is the difference between mean " scores ” of the two groups 
significant ? 

Mean of Group A — 6*2 
„ B = 7-2 

Sum of Squares A = 15-6 
„ B = 17-6 


33-2 


t = 


10 + 10 
1-358 
7-2 - 6 


= 1-844 


1-358 


•2 /l00 _ 
V 20 " 


1-64 


v = 18, t { 


( 0 * 05 ) 


2 - 10 . 


The difference between the means is not significant. 


If this form of the significance test had been applied to the data of 
Example XXX, a verdict of " not significant ” would have been 
given. The method of Example XXX is the correct one to use 
when the data occur in pairs as they clearly do in that Example, 


(iv) t-Test of the Significance of an Observed Correlation Coefficient 
With small samples the test serves to indicate whether r is, or is 
not, significantly greater than zero; that is to say, the “ null 
hypothesis ” to be tested is that the variables in the population are 
uncorrelated. 


t = 


V(1 - r s ) 


Vn — 2. 


Example XXXIII. A random sample of 12 pairs of observations from 
a normal population gives a coefficient of correlation of 0-45. Is this 
significant ? 


d.f. aas n — 2 


t 

10» 2<o*ofi) 


0-45^10 
(1 - (0-45) a }* 
2-23. 


= 1*59 


% 



278 INTRODUCTION TO STATISTICAL CALCULATIONS 


The observed value of r is not significant at the 0-05 level. It is 
therefore quite likely that the variables in the population are un¬ 
correlated. 

Example XXXIV. If the number of pairs of observations in Example 
XXXIII had been 27, then 


0*45 V25 
{1 - (0*45) 2 p 


2*52. 


For v = 25, 2 (0 02) = 2*49, and the hypothesis of “ no correlation ” 
would be considered unlikely. 

Example XXXV. What is the least value of r, in a sample of 18 pairs, 
that is significant at the 0*05 level ? 


v — 16, 2 ( oo 5 ) — 2*12. 

Thus if a value of r is to be significant at the 0*05 level 
rV 16 


Vl 


> 2*12 


i.e., r 2 > 0*2192 r > 0*468. 

Minimum values of r , for (n — 2) d.f., that are significant at various 
levels are given in Table V, page 349. 


(6) Jj-Tests of the Significance of r 

It has been noted that the distribution of r in random samples 
from a normal population, in which the correlation coefficient = p, 
is far from normal even for large samples, and that the s.e. of r, i.e. 
1 —. 

has a limited application. 

Prof. Fisher has shown that the transformation, 


* = I log. J 


l+r 


Z = i lo g> 


1 + p 


r ' * 1 - P ’ 

defines a statistic z whose distribution is approximately normal, 

with mean S and standard deviation *. /—. 

V n — 3 

Uie statistic z is used to test (i) whether an observed value of 
r differs significantly from some hypothetical value, or (ii) whether 
two sample values of r differ significantly. 

For testing whether r differs significantly from zero, the 2-test is 
preferable. 

Example XXXVI. A value of r of 0 - 6 is calculated from a random 



SIGNIFICANCE TESTS 


279 


sample of $9 pairs of observations from a normal population* Is this 
value of y consistent with the assumption that p = 0*4 ? 


i l0 & 


I + 0*6 


1 - 0*6 

- -£(2*3026) log l0 4 

- 0*6932 


C = ilo gej 


1 + 0*4 


0*4 

•£(2*3026) log 10 2*333 
0*4237 


lo ge A “= log 10 A - log 10 e 


2-3026 log lg A. 


2 - C = 0-2696 


(s.e.) 


Z . 


n 

£ 


1 

36 


s.e. 


is a normal deviate. 


0*2695 


= 1*617. 


s.e. \ 

The difference z — £ is therefore not significant, i.e., p may very well 
be 0*4. 


The 95 % F.L. of p 

The value of £ must make \z — ?| < 1*96 s.e., i.e., <1*96(£), i.e., 
<0*327. 

0*693 - 0*327 < C <0*693 + 0*327 
i.e., 0*366 <C< 1*020. 

Tables * are available for converting z into r } and vice versa; the 
calculation, however, may be made as follows: 


2 

= 0-366, r = ? 

z 

= 1-020, r = ? 

0-366 

= 1-1513 log 10 j-±-' 

1-020 

= 1-1513 log 10 j-tr 

0-3179 

— login j _ r 

0-8860 

i 1 + r 

*= lo gio i _ r 

2-079 

l + r 

1 - r 

7-691 

+ 1 

r-H r»H 

II 

0-35 

— r 

0-77 

= r 


i.e., 0*36 < p < 0*77 


Example XXXVII. Data of Example XXXIII. Apply the 2 -test of 
significance. 

If the variables in the population are uncorrelated £ and p are zero. 

* See Statistical Methods for Research Workers. Table Vb. R. A. Fisher 
(Oliver and Boyd). 



2®) INTRODUCTION TO STATISTICAL CALCULATIONS 


h Hence the hypothesis to be tested is that z does not differ significantly 

from 0. _ 1 + 0-45 

' * = i log. j- 


= 0-486 


s.e. of difference 

difference 

s.e. 


V.4 

0-486 __ 

i 


- 0*46 

_ 1 
3 “ 3 

1-458 


i.e., for 12 d.f. a value r = 0*45 is not significant. 


Comparison of Correlation in Independent Samples 
Two independent random samples have n x and n 2 pairs of observa¬ 
tions and coefficients of correlation r x and r 2 respectively. 

If the samples are drawn from the same normal population, 

\ 2 i — z i I is normally distributed with s.e. ™ K l —— ~ 4-— 

\ Wj — 6 n 2 o 

Example XXXVIII. Let n x = 27, n 2 - 33 

r x — 0-4, r 2 — 0-6. 

Are the values r x and r 2 significantly different ? 

r x = 0-4 gives z x = 0-4237. (See Example XXXVI.) 
y 2 =r 0*6 „ z 2 = 0*6932 

|*i — z 2 1 - 0-2695 s.e. dlf f. ~ + ■£q 

= 0-274. 

Then since —-— is a normal vanate, 

s.e. 

diff. 0-269 
Te. " 0*274 ~ * 

The difference is not at all significant. The hypothesis that the 
samples were drawn from the same normal population is not discredited. 

(i c ) The Variance-ratio Test 

Two independent random samples have means x x and x 2 > and 
variances sf and s 2 2 based on (n x — 1) and (n 2 — 1) degrees of 
freedom respectively. 

The object of the F-test is to discover whether the two independent 
estimates of population variance differ significantly, or whether the 
two samples may be regarded as drawn from the same normal 
population of variance a 2 . 

The F-Table 

s * 

The variance ratio = -K = F. 

s * 2 

The numerator is always the greater variance. The table—see 



SIGNIFICANCE TESTS 


Table VI, page 350—is entered in the column corresponding to % 
which is the number of degrees of freedom of the greater variance, and 
in the row corresponding to v 8 , which is the number of degrees of 
freedom of the smaller variance. 

The intersection of column and row gives, in Table VI (i), the 
value of F which is significant at the 0*05 level, and in Table VI (ii), 
the value of F which is significant at the 0*01 level, for the given 
values of v x and v 2 . 

Thus for Vj = 8, v 2 = 12, F (005) = 2*85 

f (ooi) ~ 4*50 

Thus the table gives the values of F which could be obtained in 
random sampling from a normal population at the stated probability 
levels and degrees of freedom. The test is applicable even when the 
distribution of the population departs considerably from the 
normal type. 


Example XXXIX. Data of Example XXXI. 

Do the two estimates of population variance differ significantly ? 
X7 33*5/7 


For 


23-7/6 

:4-2. 


= 121 . 


Vi = 7, v 2 =- 6, F (0 06) : 

The difference is very far from being significant, and the samples 
may well be drawn from the same population. 


The /-test of the significance of the difference between the means 
of two samples assumes that the sample variances are estimates of 
the same population variance. The F-test may therefore be applied 
to the variances before the /-test is made, but the /-test will be 
affected only by a very considerable difference between the variances. 

Numerous illustrations of the use of the F-test are given in the 
following section. 

Fisher*s z Distribution 

This z is not to be confused with the z of the previous section. 

The statistic z = \ log e F. 

Prof. Fisher's tables * of the distribution of z give the values of z 
which are significant at various levels for \ and v 2 d.f. 

(d) Analysis of Variance 

A set of N observations, with mean x, variance $ 2 , is divided into 
k sub-groups, with means x it variances s> 2 and frequencies ; 
f « 1,2,3, .. . k\ N 


* See Statistical Methods for Research Workers . R. A. Fisher (Oliver & 
Boyd). 



2&2 Introduction to statistical calculations 


It has been shown—see page 146—that 

Ns 2 = 2X(s* 2 + ^ 2 )» where c % •= \x % — x | 

Ns 2 — %(Si 2 + Cj 2 ) + n 2 (s 2 2 -f c 2 2 ) + . . . to k terms, 
i.e., S(z — *) 2 =- {Efo — x x ) 2 + S(* 2 — * 2 ) 2 + . . .} 

+ K(*i - *) 2 + ” *) 2 + • • *} 

2.(x — x) 2 — 2(# — x % ) 2 + 2n t (x t — #) 2 . 

In words, the total sum of squares = the sum of squares due to 
variation within classes plus the sum of squares due to variation 
between class means. 

This resolution of the total sum of squares into several separate 
sums of squares, each of which corresponds to a distinct source of 
variation, is the first step in an analysis of variance. The resolution 
makes it possible to assess the relative importance of the different 
sources of variation. 

Example XL. Calculate the three terms of the above formula in 
respect of the following data: 

Sample 1 9 7 6 5 8 

„ 2 . 7 4 5 4 5 

,,3:6 5 6 7 6 

Sample 

1 2 3 

9 7 6 

7 4 5 

6 5 6 

5 4 7 N « 15 

8 5 6 



= T<. 

35 

25 

30 

£T, = L* = 90 — T 


*i : 

7 

5 

6 

* as. 90/15 = 6 

(E*<) 2 . 

- Ti 2 : 

1225 

625 

900 

- 2750 


2*2 = (92 + 72 + 02 + . . , + 6 2 + 7 2 + 6 2 ) = 568. 

(£#) 2 T 2 

Then (1) S(* - *)* = S»* - ^ - F 



28 


(2) S(*, - *,)« = I*! 2 - 15£ll a = 2*,* - h! 

Pi n x 

= (9 2 + 7 2 + 6* + 5 a + 8 2 ) - — 

5 

= 255 - 245 =* 10. 



SIGNIFICANCE TESTS 


w 


283 


Similarly 

and 

i.e., 

(3) 


i.e., 

thus we have, 


S(^ 2 - x % )* = 131 - - ~ « 6 

y 

Q ()2 

2(* s - * 3 ) a = 182 - — = 2 

S(* - *,)* = 10 + 6 + 2 = 18 
n l (x 1 - xy = 5(7 - 6) 2 = 5 
« 2 (* 2 - *) 2 = 5(5 - 6) 2 = 6 
w 3 (* 3 — xy = 5(6 — 6) 2 — 

10 ' 

2»n t (Xi — x) 2 = 10 

28 = 18 + 10. 


The calculations should be made by using the following formulae: 

(.) s(* - xy = s* 2 - ^ 2 


(li) 

(iii) 


s(* - x t y -= s * 2 


VT 2 




— £) 2 = 


ST * 2 


N‘ 


The derivation of (ii) is clearly indicated in the working out of (2) 
above, while (iii) is obtained from (i)-(ii). 

In calculating the “ mean square ”, or variance, from each of the 
above sums of squares, the appropriate divisor in each case is the 
number of degrees of freedom available. 

It will be seen that 

S(# - X) 2 has 15 - 1; i.e., N - 1 d.f. 

2(x — Xi) 2 „ 15 — 3; ,, N — k, k = no. of samples 

2ni(x t — x) 2 ,, 3 — 1; ,, k — 1. 

The number of d.f. corresponding to the total sum of squares is equal 
to the sum of the numbers of d.f. corresponding to the partial sums of 

squares; i.e., the degrees of freedom, like the sums of squares, are 

additive. 


Example XU. A test was given to five students taken at random from 
the senior fifth form of each of the three Grammar Schools of a town. 
The individual scores are given in the data of Example XL. Is there 
any significant difference between the scores of the three schools ? 

In this example there is one “ factor ” or “ criterion ” of classification 
—viz,, schools. 

The object of the test is to discover if the variation due to this factor, 
i.e,, the “ between schools ” variation, differs significantly from the 
variation “ within schools.” If it is significantly greater, the test 
discriminates between the schools; if it is not, then the test affords 



284 INTRODUCTION TO STATISTICAL CALCULATIONS 


no evidence against the hypothesis that the fifteen individuaLpcores are 
random sample values of a normal variate. f 


The results obtained in 

Example XL 

are set out below, ^ 


Source of variation. 

d.f 

Sum of 

Mean 

F. 

squares. 

square. 


Between sample means. 

2 

10 

5 

3*33 

Within samples . 

12 

18 

1-5 

— 

Total . 

14 

28 



For Vi = 2, v 2 = 12 

the variance 

ratio F 

-ra“ 3 ' 33 

is not 


significant, F (0<06) being 3-88. 


The simplest form of analysis of variance occurs when there are 
only two samples. It is then an alternative to the £-test for deter¬ 
mining if the difference between the means of two samples is 
significant. 

Example XLII. Two independent random samples, each of 8 indivi¬ 
duals provide the following data. Test the significance of the difference 
between the sample means. 

Height in inches (x x ) : 63 64 65 65 66 66 67 68 

„ „ (*,) : 69 66 67 67 66 68 68 69 

/g = 2 . 828 . 

1-414 V lb 

For v = 14, * (00a) = 2-62, / (001) = 2-98 
The difference may, therefore, be regarded as definitely significant. 


(ii) Analysis of Variance 

It is usually advantageous to reduce all the values of the variable by 
the same amount in order to simplify the arithmetic. This, of course, 
has no effect on the values obtained for the three sums of squares. In 
the following working, which has been given in full by way of further 
illustration of method, all the values have been reduced by 60. 

Sample 1. Sample 2 

3 9 

4 6 

5 7 

5 7 

6 6 

6 8 

7 8 

8 9 


T i : 44 

T i'lm : 242 


60 LT, = T = 104 
450 N » 16, X ** 6*5 

= 720 



SIGNIFICANCE TESTS 


; 

28 § 


E(* - *)« „ 720 - = 720 - 676 « 44 

2(* - **) 2 « 720 - (242 4- 450) = 720 - 692 = 28 


Entfa - x) 2 « 692 - 

676 ® 16 




Sums of 

^ ^ Mean 

F. 


squares. 

square. 

Between sample means 

16 

1 16 

8 

Within samples . 

28 

14 2 

— 

Total .... 

44 

15 




^<0 os) — 4*60 


V 

I! 

> 

11 

> 

8 F (00 i, ^ 8*86 



F s= 8 is therefore significant not far from the 0-01 level. 

When v x = 1, the t and F tests are simply related; t = VF. 

In the above example t ~ 2-828, F = 8-0. The 5% and 1% points of 
the F table for v 2 = 1, v 2 = r, are the squares of the 5% and 1% 
significance values of the tf-table for v = r. 

Thus above v x = 1, v 2 = 14 F (001) = 8-86 

v = 14 * {001) = 2-98 = V5¥. 

Example XLIII. Samples of pigs of each of four types were fed on the 
same ration over a period. The figures in the table denote increase in 
weight in lb. per pig at the end of the period. Do the sample means 
differ significantly ? 



Type 

of pig. 




A 

B. 

C. 

D. 



6-1 

14*6 

11*5 

13*4 



13-8 

15-7 

16*0 

20*2 



8-7 

11-8 

9-0 

12-9 


> 

120 

16*5 

13-3 

12-5 



40-6 

58-6 

49-8 

59*0 

T - 

- 208 

1648-36 

3433-96 

2480-04 

3481*00 

ST,* = 

= 11043*36 


2704 

170-88 

2874-88 - 2760-84 

114-04 

56*84. 


Zx* = 2874-88; 


T 2 _ 208 2 
N ~~ IT : 
£(* - *) 2 = 2874*88 - 2704 
11043*36 


£(* - ^) a « 2874*88 


4 


Snitfi - X)* = 2760-84 - 2704 = 



286 INTRODUCTION TO STATISTICAL CALCULATIONS 


Sums. 

d f. 

Mean square. 

F. 

56-84 

3 

18-95 

1-99 

114-04 

12 

9-50 

— 

170-88 

, = 3-49. 

15 




Between type means ; 

Within samples: 

Total: 

For v, = 3, v, = 12, F (00S) 

Hence F = 1-99 is not at all significant. 

It should be noted that the value of F gives only an all-over indication 
of significance; that is to say the fact that a calculated value of F is 
not significant does not necessarily mean that one, or more, of the 
differences between sample means may not be significant. 

It would of course be possible to apply the f-tcst to the difference 
between any two means—starting naturally with the two means which 
have the greatest difference—but it is better to proceed as follows. 

The best estimate of the standard error of any sample mean is that 
calculated from the “ within samples ” variance, which is the variance 
of a single entry m the data. 


s e. of any mean 


h -e 

V 4~ 


^ (since there are four items m a sample) 
and the s.e. of the difference between two sample means 

- 

=£= 2 - 18 . 

The variance 9-5 corresponds to 12 d.f. 

With v — 12, ^o*o 5 ) ~ 2*18 
diff. bet. means 

Now-~ 2-18 for significance at the 0-05 level. 


9-5 

+ T 


s.e. 


i.e,, 


diff. 

2T8 

diff. 


- 2-18 

- 2-18 x 2-18 


4-75, 


that is to say, the smallest difference between sample means which is 
significant at the 0-05 level is 4-75. 

The sample means are 10-15, 14-65, 12-45, 14-75. 


Samples 

A-B 

A-C 

A-D 


Difference 
between means. 
4-5 
2-3 
4-6 


Samples 

B-C 

B-D 

C~D 


Difference 
between means. 
2-2 
0-1 
2-3 


None of the differences is significant at the 0-05 level, though A-B 
and A—D are nearly so. 



SIGNIFICANCE TESTS 


287 


Two Factor Analysis 

Suppose the data of Example XLIII had referred to 4 pigs of each 
of 4 types as before but that, instead of all the pigs having the same 
ration, one of each type had been fed on ration 1, one of each type 
on ration 2, and so one. The classification is now two-fold, columns 
representing the type of pig, and rows representing the scale of 
rationing. 

The analysis of variance, in this case, consists in testing the 
significance of differences between row means as well as between 
column means. 

Example XLIV. Make an analysis of variance of the following data. 


Type of pig 

i 

T 

V- 



A 

B 

C. 

D. 




I 

6-1 

14-6 

11*5 

13-4 j 

45-6 

2079-36 

Scale of ration 

II 

13-8 

15-7 

16-0 

20-2 I 

65-7 

4316-49 


111 

8-7 

11*8 

9-0 

12 9 

42-4 

1797-76 


IV 

12-0 

16-5 

13-3 

12*5 

54-3 

2948-49 







| 208-0 

11142-10 


As in Example XLIII, 

T 2 

N = 16, T =~ 208, — -= 2704,E(*-*) 2 = 170-88,56-84. 

The subscript i refers to columns as before, the subscript j to rows. 

— x ) 2 is calculated in the same way as — x) 2 } substituting 

rows for columns. The sum of squares due to “ error ” or “ residual " 
variance is obtained by subtraction of the sum of the first two items 
from the total—see table below. 


— x) 2 



= 81-525. 


T 2 

N 


11142*1 

4 


- 2704 


Source of 


Sums of 

Mean 


variation. 

df. 

squares. 

square. 

F. 

Columns . 

3 

56-840 

18-947 

5*24 Sig. below 0-05 level 

Rows 

3 * 

81-525 

27-175 

7-52 „ „ 0-01 „ 

" Error " 

9 

32-515 

3-613 

— 

Total 

15 

170-880 




The estimates of variance between " Types ” and between “ Rations 11 
are compared with error, or residual, variance. 

The differences between column means are therefore significant and 
the differences between row means highly significant. 



288 INTRODUCTION TO STATISTICAL CALCULATIONS 


t-test of significance of difference between any pair of means . 

The error variance, 3-613, corresponding to 9 d.f., is the estimated 
variance of a single gain in weight. The variance of the mean of a 


sample of 4 items is therefore 


3-613 

4“ 


and the variance of the difference 


between two sample means 3 613 x £. The s.e. of the difference is 
V 1^065*== 1-344. 


For 9 d.f., / (006) — 2-26 and * (001) = 3-25. 

least difference significant at 0-05 level — 2-26 x 1-344 

= 3-04 

and „ „ 0-01 „ -= 3-25 x 1-344 

= 4-37. 


The “ least significant differences ” are the same for both column and 
row means since n t ~ n y 
The column means are 10-15, 14-65, 12-45, 14-75. 


Types. Difference. Types. Difference. 


A~B 

4*5 

Highly sig 

B-C 

2-2 

N S. (i e , not significant) 

A~C 

2-3 

N.S 

B-D 

0-1 

N.S. 

A-D 

4-6 

Highly sig. 

C~D 

2-3 

N S. 


The differences between row means should be similarly tested. 

In two factor analysis what is called “ error ”, or “ residual ”, in 
addition to the variation due to the purely chance effects of random 
sampling, includes variation due to possible “ interaction ” between 
the two factors. The residual sum of squares which is the best 
available estimate of the true error variation is consequently too 
large if interaction exists, and the significance of the two " main 
effects ” is underestimated. 

In three factor analysis in addition to the main effects, due to the 
three factors, A, B and C, three first order interactions can be 
obtained, i.e., A X B, A x C and B x C. The “ residual ” sum of 
squares may still contain an element due to the possible second order 
interaction, A X B x C. For an explanation of the meaning of 
" interaction ” and for examples showing how its calculation is 
effected, the reader is referred to Industrial Experimentation , 
H.M.S.O., 1949. 

Three Factor Analysis 

Tbt Latin Square 

A Latin Square of “ order 4 ” is an arrangement of the four letters 
A, B, C, D in 4 rows and 4 columns, in such a way that each letter 
occurs once, and once only, in each row and in each column. 



SIGNIFICANCE TESTS 


289 

The Latin Square arrangement permits of an analysis of three 
“ factors ", columns corresponding to one factor, rows to another 
and the letters to the third. 

A simple use of the Latin Square is in an experiment to test the 
effects on yields of a variety of wheat of n different manurial treat¬ 
ments. The columns and rows correspond to possible variations 
in the fertility of the soil in the two directions, and a Latin Square 
of order n is used to randomize the distribution of the n treatments. 

There are twelve different arrangements of the letters in a Latin 
Square of order 3 and 576 in a square of order 4. For higher orders 
the number of different arrangements soon reaches astronomical 
dimensions. 

The number of degrees of freedom available for estimation of 
error in a single 3x3 Latin Square is only 2; that of a 4 x 4 
square is 6, and of a 5 X 5 square 12. Since the reliability of the 
estimate of error increases with the number of degrees of freedom 
on which it is based, single 3x3 and 4x4 squares are un¬ 
satisfactory. 

Where the design of the experiment includes more than one square 
(see page 291), the reliability of the estimate of error is greatly 
increased, being based on 6 d.f. in the case of two 3x3 squares, 
15 d.f. in the case of two 4x4 squares, and 28 d.f. in the case of 
two 5x5 squares. 


Example XLV. The table shows yield in lbs. per plot. The letters 
A, B, C, D, refer to four different manurial treatments. Carry out an 
analysis of variance. 


A 260 B 300 

B 280 A 300 

D 320 C 345 

C 372 D 395 


C 335 D 371 

D 300 C 410 

B 340 A 254 

A 290 B 328 


x - 320. 

T,' 


A - 60 B - 20 0 15 D 51 

b _ 40 A - 20 D - 20 0 90 

B 0 0 25 B 20 A - 66 

0 52 » 76 A - 30 B 8 

- 14 1 
10 

- 21 
105 

196 

100 T*/N = 80 2 /16 = 400 
441 S** = 32,000 

11,025 

T,: - 48 60 - 15 83 

80 

11,762 ST,•/», = 2,940-5 

T, a : 2,304 3,600 225 6,889 

13,018 ST,*/*, * 3,254-5 


K 



290 INTRODUCTION TO STATISTICAL CALCULATIONS 


The Sum of Squares Due to “ Treatments ” 


A. 

B. 

C. 

D. 


- 60 

- 40 

52 

0 


- 20 

- 20 

25 

75 


- 30 

20 

15 

- 20 


- 66 

8 

90 

51 


T,: -176 

- 32 

182 

106 


T, 1 : 30,976 

1024 

33,124 

11,236 

ST fin, = 

Sums of Squares 

Total 

= 32,000 

- 400 = 

31,600 


Columns 

= 3,254-5 

- 400 = 

2,854-5 


Rows 

= 2,940-5 

- 400 = 

2,540-5 


Treatments 

= 19,090 

- 400 = 

18,690 


Residual 

= 

= 

7,515 


Source of variance 

Sum of 
squares 

df 

Mean 

square 

F. 

Columns . 

2,854-5 

3 

951-5 

— 

Rows 

2,540-5 

3 

846-8 

— 

Treatments 

18,690 

3 

6230 

4-97 

Error or residual 

7,515 

6 

1252-5 

— 

Total 

31,600-0 

15 




Neither in columns nor in rows is the variation at all significant 
since in each case the mean square is less than the mean square due 
to error. 

For v 1 = 3, v 2 = 6, F (0 o 5 ) = 4-76, so that the variation due to 
treatments is significant at the 5% level. 

The standard error of the differe nce betwee n the means of any two 
rows, columns or treatments is Vl 252*5 x f, i.e., 25. 

The least difference between means which is significant— 

at the 5% level = 2-45 X 25 -= 61-25 

„ 2% „ == 3-14 X 25 = 78-5 

„ 1% „ - 3-71 X 25 = 92-75 


Treatments. 

Difference. 

Treatments 

Difference. 

A-B 

36 N.S. 

B-C 

53-5 N S. 

A-C 

89*5 S. below 0-02 level 

B-D 

34*5 N.S. 

A-D 

70-5 S. „ 0-05 „ 

G~D 

190 N.S, 


Suppose the design of the experiment takes the form of two Latin 
Squares , each of order n. Then, if the subscripts s, c , r and t corre¬ 
spond to squares, columns, rows and treatments, respectively, and 



SIGNIFICANCE TESTS 


291 


n — number of columns = number of rows in one square, It can be 
shown that 

Z(x — x ) 2 = n 2 Z,(x g ~x ) 2 + nZ e (x c — #,) 2 + nZ r (x r — x t )* 

+ 2 nZ t (x t ~ *) 2 + %d* 

where Zd 2 = " residual ” sum of squares ; and that for 2 blocks, the 
degrees of freedom corresponding to the respective sums are 

N ~ 1 = (2n 2 — 1) = (2 — 1) + 2 (n - 1) + 2 (n — 1) + (n — 1) 

+ (2w 2 - 5w + 3) 

In the following example the resolution of the total sum of squares 
is effected (1) by following out the instructions of the above formula, 
item by item, and (2) by a routine method. 

Example XLVI. In the following table the variable is yield per plot 
and letters stand for treatments. The values of x have been reduced 
by a constant amount to simplify the working. 


Square 1. 

Row 

totals. 

Square 2. 

Row 

totals. 

A 

-2 

C 10 E 

7 

15 

c 7 A 

-5 B 

4 

6 

B 

1 

A l C 

10 

12 

a - a » 

-2 o 

10 

-3 

C 

22 

B 4 A 

-5 

21 

B 1 C 

1 A 

-8 

—6 

Column 









Totals 

21 

15 

12 

48 

- 3 

-6 

6 

— 3 


Zx 2 = 1161, N = 2n 2 = ,18, T = 46, x = = 2-5. 

Total: 


Z(x - x) 2 = 1161 


45 2 

78 


= 1161 - 112-6 = 1048-5. 


| Latin Squares : 

- *)* = 9{(5i - 2 })* + (- i - 2 J^) 2 } 

= *(289 + 289) = 144-5. 

Columns: 

Square 

Column means ; means. 

Square 1 : 7 5 4 < * ^ 

Square 2: - 1 - 2 2 - £ 

- *,)* = 3{(li) a + (*)• + (li)« + (j)‘ + (lj)* + (2B*} 

= 40 . , 

Lows : 




nZ,{* t - £,)> = *{(*)• + (li)« + (1J)« + (2J)« + (|)« + (If)*} 
= 40 



292 YfcT.R0®tf'£’riON TO STATISTICAL CALCULATIONS 


U, 


'rmtments i 


A. B. C. 

-2 1 22 

1 4 10 

-5 7 10 

-11 1 7 

-5 - 2 1 

-8 4 10 



Totals: — 

30 

16 

60 



Means: — 

5 

2*5 

10 


— 

x) 2 - 6{(— 5 

- 2-5) 2 

+ (2-5 - 

2*5) 2 + 

(10 - 2-5) 2 } 


« 675. 






Sum of 


Mean 



Source, 

squares 

d f. 

squares 

F. 


Squares 

. 144-5 

1 

144-5 

5-8 

Sig. at 0-05 level 

Columns 

40 

4 

10 

— 


flows . 

40 

4 

10 

— 


Treatments . 

. 675 

2 

337-5 

13-6 

Highly sig. 

Error . 

. 149 

6 

24-8 

— 

Total . 

. 1048-5 

17 





For v x = 2, v 2 = 6, F (0 . 01) = 10-92. 

/-test of significance of difference between means of two treatments. 
S.e. of difference between means = V24-8 X f — 2-87. 


For v — 6 , /(o 05 ) — 2*45, /( 0 oa) — 3-14, /( 0 od — 

3-71. 

least difference significant at 0-05 

level — 

7-0 

»» >» >t 

0-02 

tt — 

90 

t> tt >t 

0-01 

,, — 

10-6 

Treatments. 

Difference. 



A-B 

7-5 

Sig. at 0-05 level 

A-C 

15 

Very highly sig. 

B-C 

7-5 

Sig. at 0-05 level 


The above direct method of calculation becomes very long-winded 
and tiresome if n is large and recurring decimals are involved. In any 
case it is better to proceed as follows: 

Let T x = Lx for Square 1 

T 2 = Lx „ 2 


Sums of Squares 

i denotes columns, j denotes rows, t denotes treatments. 

p) Latin Squares : 

1 T 2 

Sum = ^( Tl » + T a 2 ) - 


- *(48* + (- 3)*) - = 144-5. 




SIGNIFICANCE TESTS 


293 


v / 


(2) Columns, 
Sum = 


~£T < 2 


TV 


T 2 
1 2 


= }(21* + 15 2 + 12 2 + (- 3)* + (- 6) 2 4- 6*) 
^48^ (— 3) 2 

9 9 


= 40. 

Note .—There are six columns in all, but the proper divisor is 3 as 
the 3 columns of each square are to be regarded as contributing 
separately to the sum of squares. 


(3) Rows : 



= }(15 2 + 12 2 4- 21 2 4 - 6 2 4- (- 3) 2 4- (- 6) 2 ) - 
= 40. 


See note above. 


(4) Treatments : 


1 T 2 
Sum = —ST* 2 - — 

2 n 1 2 n 2 


- ${(- 30)4- 15 2 4- 602} 
= 787-5 - 112-5 = 675. 


452 

18 


The divisor is 6 because there are six entries for such treatment. 


In a design which includes three Latin Squares the formula 
becomes 

2(* ~ *) 2 — n 2 h s (x 8 — x) 2 4- nH c {x c — x s ) 2 + tiZ r {x r — x s ) 2 

+ 3 nZ t {x t - x) 2 4- Ei* 

and the numbers of degrees of freedom 
N - 1 = 3n 2 - 1 == (3 - 1) 4- 3(* - 1) + 3(n - 1) 

4- in - 1) + (3w 2 - 7n + 4)* 

The following formulae are obtained by applying the analysis of 
variance method to correlation and regression. # 

Significance of an Observed Correlation Ratio 

F _ *1* v N -~~ h v i *= h - 1 

1 - i>* X A - 1 v a = N- A 



294 INTRODUCTION TO STATISTICAL CALCULATIONS 


For correlation ratio of y on x r? = r iV % 

h = no. of arrays of y 
n = frequency in each array 
nh = N 


and similarly for correlation ratio of x on y. 

If the calculated value of F is significant at the 0*05 level it is 
taken that the variables are really correlated injhe population. 

T est for Non-linearity of Regression ~ f 

(!) F = X = h - 2 , v 2 = N - h. 

(2) x 2 = (N — A) ^ , entering the x 2 table with (A — 2) d.f. 

If the calculated values of F or x 2 are significant at the 0*05 level 
the regression departs significantly from linearity. 


Example XL VII. A random sample of 85 pairs of observations with 
8 arrays of y's gave a correlation ratio of y on x of 0*5. Is this value 
of r ] y significant ? 


v x = 7, v 2 = 77, F = 


0*25 77 n 

0^5 x T - 3 ' 67 - 


This is significant beyond the 0*01 point; i.e., there is definitely correla¬ 
tion between the variables in the population. 

Example XL VIII. A random sample of 100 pairs of observations, 
grouped in 8 arrays of x*s gave r — 0*4 and 7\ x — 0*5. Does the 
regression of x on y depart significantly from linearity ? 


(i) 


0-25 


- 0-16 92 , 

w* e =1 ' 84 ’ Vi = 6 


v 2 = 92. 

This value of F is not significant at the 0-05 level, 
evidence that the regression is not linear. 


(ii) 


X 2 


92 X 


0*09 

0-75 


= 11 . 


There is no 


For v = 6 this value of x 8 is significant about the 0*10 level, i.e., it 
would not be regarded as significant. 


(e) Statistical Quality Control 

The object of quality control is to enable the producer to study 
the nature and^xtent of the variability of his product. Quality 
control charts are part of a continuous inspection system. Samples 
are drawn at frequent intervals, as manufacture proceeds, and 
measurements made on certain dimensions. Averages and ranges, 
for example, are then computed and plotted on the charts. As an 



SIGNIFICANCE TESTS 


295 


alternative to this method of direct measurement, in some cases, " go 
and not-go ” gauges are used, and the number of defective com¬ 
ponents per sample noted and plotted ona n number defective " or 
“ fraction defective ” chart. 

The charts make it possible to distinguish between those variations 
which are due to “ chance ” causes and are an inevitable feature of any 
production system, and those which are due to " assignable ” causes, 
and hence are an indication that some investigation is necessary. 

The output of a machine, engaged on mass production, when All 
assignable causes of error have been eliminated, has a characteristic 
frequency distribution of the deviations of the individual com¬ 
ponents from the process average. This distribution may be 
approximately normal with mean X and standard deviation a, 
in which case only 5% of the individual measurements will be 
expected to lie outside the range X ± 1*96® and only 2 % 0 outside 
the range X ^ 3*09<y. 

Quality control charts, however, do not show individual measure¬ 
ments, but sample averages, and even if the parent population 
departs considerably from normality the sampling distribution of the 
means of samples of n items is approximately normal with mean X 


and standard deviation - 7 =. 

Vn 

The control chart for averages has inner control limits at 

X ± T96 4= and outer control limits at X ± 3*09 where s is 
Vn Vn 

the estimate of the population standard deviation obtained from the 


samples. The respective chances of a sample mean exceeding 
these limits in either a positive or a negative sense are 4*0 and tsW- 

The number of items in a sample, when control is being exercised 
through direct measurement, may be anything from two to ten or 
more. Small sampling technique is therefore necessary. 

In order to reduce the labour and time of calculation in the factory 
to a minimum, special tables have been compiled; for these and for 
practical instruction on the subject the reader should consult the 
British Standards Institution’s B.S. 600R : 1942 * dealing with 
“ Quality Control Charts - 

The following examples are intended merely to illuftrate the compu¬ 
tation of control limits in the cases of charts for averages and ranges* 


* B.S . : 1942 may be obtained from the British Standardslnstitution, 

24-28 Victoria Street, London, S.W.l. 



296 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example XXIX. Suppose a machine is producing a component with 
a nominal dimension TOO". With everything under control it is found 
that the average dimension of the output is TOO", with a standard 
deviation of 0*01". 

Samples of n items are drawn at intervals and measured, and the 
mean measurement for each sample is plotted on the control chart, with 
control limits at p t = ^ and p 2 = ToW* 

The position of the control limits may be calculated thus: 

For samples of 4 items 

Inner limits = 1*00 ± 1* 96 —2^1 . 

V 4 

Outer limits = 1-00 ± 3-09 ^ - ~- . 

% v 4 

For samples of 9 items 

Inner limits = TOO ± 1-96 — -P . 

V 9 

Outer limits = TOO ± 3-09 —--l *. 

V 9 

1*96 3*09 

Table 10, B.S. 600R, gives —=r and —for samples with from 2 

Vn Vn 

to 30 members, to shorten the above calculations. 

When control is being established on a new job it is necessary to 
obtain estimates of the mean and standard deviation of the output. 
After a preliminary run, a number of samples—not less than ten and 
preferably considerably more—is taken, and the mean and standard 
deviation calculated. 

With ten or more samples of less than ten members each it is 
possible to get a good estimate of the standard deviation from the 
mean range, w, of the samples. The short table below shows how 
these estimates are obtained from w . 

Control limits are then established on the bases of the above 
estimates and are revised when more samples become available. 

Extract from Table 13 *— B,S. 600R : 1942 

To obtain the average value of w, (w), multiply a by the appro¬ 
priate value of d n . 

For estimate of a divide w by appropriate value of d n . 

No. in sample: 2 3,4 5 6 7 8 9 10 

d n : 1*128 1*693 2*059 2*326 2*534 2*704 2*847 2*970 3*076 

* The values in Tables 13 and 13A are from a paper by Professor E. S 
Pearson, “ The Probability Integral of the Range in Samples of n Observation 
from a Normal Population ”, Btometnka, Vol. 32, 1942. 



SIGNIFICANCE TESTS 


The averages control chart plays an important part in quality 
control, but it will be realized that although the means of n com¬ 
ponents may be well within the inner control limits, the spread of the 
individual components may be excessive. A chart showing the 
range of each sample, or, alternatively, its standard deviation, with 
appropriate control limits, is therefore a necessary complement of an 
averages chart. 

The ranges chart is often preferred on the score of simplicity, 
although when the calculation of the standard deviation is reduced 
to routine it should present no difficulty. The form of this calcula¬ 
tion which is advocated in 5.S. 600R is illustrated in the next 
example. 

Example L. From the following data calculate the control limits for 
sample means, and display them on a chart together with the sample 
means. (Dimensions in cms.) 


No. of sample. 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 


1-01 

1*02 

1*04 

1*01 

1*00 

0*98 

0*95 

0*92 

1*00 

0*99 


0*98 

1*03 

1*00 

1*01 

1*02 

0*97 

1 05 

0 95 

1*04 

1*01 


1*02 

0*95 

1*04 

0-97 

1*01 

1*00 

1-00 

0 98 

1*02 

1*02 


1*03 

0*96 

1-05 

0*93 

0*96 

1*01 

0*97 

105 

1*00 

0*98 


0*99 

1*00 

0*99 

1*04 

0*97 

0*96 

1*03 

1 06 

0*96 

0*96 


5*03 

4*96 

6*12 

4*96 

4*96 

4-92 

5*00 

4*96 

5-02 

4*96 

Xxi 

1*006 

0*992 

1*024 

0*992 

0*992 

0*984 

1*000 

0*992 

1-004 

0*992 


0*06 

0*08 

0*06 

0*11 

0*06 

0*05 

0*10 

0*14 

0*08 

0*06 

w «s> range 


Q-Q78 

Process mean = X = -—- = —— = 0*998". 

n 10 

_ 

Mean range = — = w = 0-079". 
n 

k = no. of samples = 10 ; n — no. in sample = 5 

Estimated <y = 0-079 ~ 2-326 
= 0-034. 


Control Chart Limits for Averages (#*) 

Inner limits == 0-998 ± 0-029, i.e., 0-998 ± 1-96^^. 

V5 

Outer limits = 0-998 ± 0-046. 


Actually there is no need to estimate a from w, since Table 10, 
B.S. 600R gives a multiplier for w, for given n, which enables the 
limits to be obtained directly from w. 



298 INTRODUCTION TO STATISTICAL CALCULATIONS 


Calculation of a from the Ten Samples 
Example LI. 

Sample l. 


Sample 2 


d = ~ 



Unit 0*01 


Unit 0*01 


X . 

Diff. from 0*92. d % . 

Diff. from 0*92. 

dK 

1*01 

9 

81 

10 

100 

0*98 

6 

36 

11 

121 

1*02 

10 

100 

3 

9 

1*03 

11 

121 

4 

16 

0*99 

7 

49 

8 

64 


43 

387 

36 

310 

43 

6 

£(* - = 

387 - M- 8 
5 

X(x - *) 2 = 

310 - 

8*6 

= 

17 (in 0-01 2 

units) == 

61 


s = 


S 


(36) 1 


3*192 


1*006 


s = 0*018 in ordinary i.e., s = 0*032 in 

units ordinary units, 

s is not required unless it is proposed to use a control chart for standard 
deviations. 

The reader is advised to check the following results: 


Sample no. 

z{x - xy 
(0*01) 2 units. 

s 

(natural units) 

1 

17 

0*018 

2 

51 

0032 

3 

29 

0*024 

4 

73 

0*038 

5 

27 

0*023 

6 

17 

0*018 

7 

68 

0*037 

8 

151 

0055 

9 

35 

0*026 

10 

23 

0*021 


491 

0*292 


sample s,d. 


n = no. of components in a sample; k = no. of samples. 

5 Total sum of squares _ 491 

° = (nk - k) = 50 - 10' 

a = 3-5. 

i.e., o = 0*036 in natural units, as compared with 
a — 0*034 obtained from the mean range. 

a may also be estimated from the standard deviations of the 10 



SIGNIFICANCE TESTS 299 


samples. It is not s t however, but s multiplied by a number taken 
from Table 11 ( B.S . 600R), for the appropriate value of n. 

1 0-029 

In this case, a - * X ^ = 0-034. 


Three methods of estimating a have been given. The first, using 
the mean range, is the simplest, and is accurate enough for the 
purpose, if the restrictions noted above are observed. 

The second, or variance within samples method, is the most 
accurate. In the third method the reader will have noted that the 

sample standard deviation, s*, was obtained from , 


instead of from 


gfa - * t ) 2 


This does not cause any error, since 


(n-1) 

although (n — 1) is the correct divisor, the underestimation due to 
the use of n is compensated for in the factor used for converting s 
into <j. 


Control Chart Limits for Standard Deviation (s) 

These can be found by multiplying a by appropriate values from 
Table 11, B.S. 600R, with n = 5, <y = 0*034, 

Lower Limits 

Outer: 0-135(0-034) = 0-004. 

Inner: 0-311(0-034) = 0-011. 

Upper Limits 

Inner: 1-493(0-034) = 0-050. 

Outer: 1-922(0-034) = 0-065. 

Control Chart Limits for Range (w) 

These can be found either (i) by multiplying o by appropriate 
values from Table 13, B.S . 600R, or (ii) by multiplying w by 
appropriate values from Table 13A, B.S. 600R. 

With n = 5, w = 0-079, 

Lower Limits 

Outer: 0-16(0-079) = 0*013. 

Inner: 0-37(0-079) = 0-029. 

Upper Limits 

Inner: 1-81(0*079) =0-143. 

Outer: 2-34(0-079) = 0*185. 



300 INTRODUCTION TO STATISTICAL CALCULATIONS 

In practice only the mean range, w, and the upper limits need be 
placed on the ranges chart. 












SIGNIFICANCE TESTS 


301 


It will be seen from the diagram that all the points on both charts 
are inside the inner control limits. So long as only the expected 
proportion of points falls outside the control limits the process is 
said to be under " statistical control 

Example LII. If an analysis of variance of the ten samples is made 
the following results are obtained: 

Data of Example XL 
[x — 1) X 100. 

Sample no. 


I. 

II. 

III. 

IV. 

V. 

VI. 

VII. 

VIII 

IX. 

X. 

1 

2 

4 

1 

0 

- 2 

- 5 

- 8 

0 

- 1 

- 2 

3 

0 

1 

2 

- 3 

5 

- 5 

4 

1 

2 

- 5 

4 

- 3 

1 

0 

0 

- 2 

2 

2 

3 

- 4 

5 

- 7 

- 4 

1 

- 3 

5 

0 

— 2 

- 1 

0 

- 1 

4 

- 3 

- 4 

3 

6 

- 4 

- 4 

3 

- 4 

12 

- 4 

- 4 

- 8 

0 

- 4 

2 

-4 T = 

: 9 

16 

144 

16 

16 

64 

0 

16 

4 

16 £T<* 





£* 2 

= 551, 

N = 50 




E(# - *)* : 

S(* - *t) z ■ 
Zn % (xi - *)*: 


11 * 

551 - — - 548*58 
50 

551 - ~ = 490*80 
5 

60*2 - 2*42 = 57-78 




Sum of 

Mean 

Source 

d.f. 

squares. 

square 

Between sample means . 

9 

57*78 

6*42 

Within samples 

40 

490*80 

12*27 

Total 

49 

548*58 

_ 


The variance between sample means, being less than the within 
samples variance, is plainly not significant; this is indicated graphically 
on the control chart for averages, where all the sample means lie within 
the inner control limits. Statistical quality control charts are, in fact, 
a visual form of analysis of variance. The averages chart pictures the 
between sample means variance; the ranges chart shows the within 
samples variance. 

The within samples variance must be divided by 100* to reduce it to - 
the unit of the original table— 



302 INTRODUCTION TO STATISTICAL CALCULATIONS 


Example Lin. The nominal dimension of a component is 0*1350" with 
a tolerance of ft 0*0035". The data given below are in 0*0001 units 
above and below the nominal dimension; thus an entry of 15 means 
0-1355 and an entry of — 25 means 0*1325. 

Calculate the mean dimension of the ten samples and the average 
range. Calculate also the control chart limits for averages and for 
ranges. 


No. of sample. 



I. 

11. 

III 

IV. 

V 

VI 

VII. 

VIII. 

IX 

X. 


15 

5 

5 

- 15 

- 10 

5 

0 

- 25 

- 5 

- 10 


- 25 

10 

0 

0 

10 

- 5 

- 10 

- 5 

- 15 

- 15 


25 

0 

- 5 

~ 0 

5 

5 

- 5 

- 15 

- 5 

10 


0 

0 

5 

- 10 

15 

10 

20 

5 

10 

- 10 


5 

0 

- 10 

- 10 

- 15 

- 5 

5 

0 

5 

5 

T,: 

20 

15 

- 5 

- 35 

5 

10 

10 

- 40 

- 10 

- 20 

£i : 

4 

3 

- 1 

- 7 

1 

2 

2 

- 8 

- 2 

- 4 

w : 

50 

10 

15 

15 

30 

15 

30 

30 

25 

25 


_ 'Exi 

Process average ~ X — — == 
X:W 

Mean range = w = — = 
w 

For samples of 5 a = ^326 


- 1, i.e., 01349 
24*5, i.e., 0 00245 
= 10*53, i e , 0*00105. 


Control Chart for Averages 

Inner Limits = 0-1349 ± 1-96= 0-1349 ± 0-001. 

Outer Limits = 0-1349 ± 3-09^1^ = 0-1349 ± 0-0015. 

V5 

Or using Table 10 (5.5. 600R) 

Inner Limits = 0-1349 ± 0-876 (0-0011). 

Outer Limits = 0-1349 ± 1-382 (0-0011). 


{Control Chart Limits for Ranges. (See Table 13, B.S. 600R for 
multipliers) 

Lower Limits: 

Outer: 0-37 (0-0011) = 0-0004. 

• Inner: 0-85 (0-0011) = 0-0009. 



SIGNIFICANCE TESTS 


303 


Upper Limits: 

Inner: 4-20 (0-0011) = 0-0046. 
Outer: 5-45 (0-0011) =,0-0060. 


Example LIV. Calculate the " variance within samples ’’ for all ten 
samples in the foregoing example. 

T, : 20 15 - 5 - 35 6 10 10 - 40 - 10 - 20 

T,«: 400 225 25 1226 25 100 100 1600 100 400 


„ I** _ 840 

5 

I* 2 = 5500 
T' 2 

Z(# - XiY = 2* a - S “ - 5500 - 840 = 4660 


d.f. = 50 


/4660 

10; a 2 « ^ = 10-79, i.e., 0-0011 

V 40 


which agrees with the estimate of a obtained from w. 

Completing the analysis of variance of the ten samples we have: 


£(* - x) 2 = 6500 - ^ 


5500 


- *)* = = 840 


(~ 50) a 

50 

50 = 790. 


5450 


Reliability of the Charts 

The control charts for averages and ranges, as so far described, 
may show that the process of manufacture is under statistical 
control, in that no more than the expected proportion of plotted 
points falls outside the control limits, but at the same time the 
product may be unsatisfactory, as too many of the components fall 
outside the specified limits of toleranpe. This state of affairs will 
not be revealed by the charts unless certain conditions are satisfied. 


(i) The average range, w t must be less than 


d n 


times the 


2(3*09) 

distance between T n and T L , the upper and lower limits of 
tolerance. 

Values of L = ~~ are given in the following table, for values 
of n from 2 to 10. For table of values of d n) see page : 



304 INTRODUCTION TO STATISTICAL CALCULATIONS 


«: 2 3 4 5 6 7 8 9 10 

*L: 0-18 0-27 0-33 0-37 0-41 0-44 0-46 0-48 0-50 

(ii) The outer control limits on the averages chart must lie 
inside a band which is determined by laying off a distance Mze> 

where M = 3-09(i — y— t=). inwards from the tolerance 

limits. Values of M, for values of n from 2 to 10, are given below. 
n: 2 3 4 5 6 7 8 9 10 

*M: 0*80 0*77 0*75 0*73 0*72 0*71 0*70 0*69 0*69 


Explanation of the Conditions 

1. The individual items in a normal distribution have a range of 
2 X 3*09a for all practical purposes. 

/. 6 T 8 or must be less than T v — T L . 
w 
dn 
w 


But 


6-18 


i.e. 


and 


d n 

w 


Tu — r L 
(T, - 


t _ dn_~ 

~ 618' 


2. Individual items of a normal distribution may be at a distance 


of 3*09(1 from the mean. Therefore X must be at least 3-09 
from a tolerance limit. 


w 

dn 


The upper and lower control limits are drawn at X ^ , i.e., 


V n 


w 


at X ± 3*09 j — 

dnyfl 

Therefore distance between an outer control limit and a tolerance 
limit must not be less than __ 

w w 

3-09^- - 3-09 j— 7 = 

®nV n 


i.e., 

and 


w 


M 


3-09 

dn 

J_ 

d n 


1,09 (; 


3 ' 09 4Vn) 

_i,V 

dn's/ w 


* These and other tables with practical examples of Control Charts are 
given in A First Guide to Quality Control for Engineers , H.M.S.O., which the 
reader should consult. 



SIGNIFICANCE TESTS 


305 


Conditions (1) and (2) Applied to Example LII1 

(1) L(Tu - T l ) - 0*37(0*007) =* 0*0026 

w =s 0*0025 

so that condition (1) is barely fulfilled. 

(2) Tu — Mw = 0*1385 - 0*73 (0*0025) « 0*1367 
T l + M® = 0*1315 + 0*73 (0*0025) = 0*1333 

The outer control limits are at 0*1363 and 0*1334, so that they are 
both, narrowly, within the " allowable width of control limits ", 
A.W.C.L. 

To gain an insight into the actual practice of statistical quality 
control the reader should consult, in addition to the books already 
mentioned, one or more of the books * dealing with the whole subject, 
which have recently been published. 

Exercise 10 

1. What is the probability of obtaining ( 1 ) 60 or more successes m 100 
trials or (u) 21 or more successes in 30 trials, if the chance of success at each 
trial is £ ? 

2. p = Find the value of m , if the chance of m or more successes in 
40 trials is approximately 

3. A coin comes down heads 65 times in 100 throws. Between what 
limits does p , i e., the chance of getting a head at a single throw with this 
com, almost certainly lie ? 

4. In a " multiple choice " test with 50 questions there are five possible 
answers to each question, each equally likely if the subject is guessing. What 
is the probability (i) of 16 or more, (n) of 19 or more correct answers ? 

5. If the expectation is that 2% of men of exact age 60 years will die 
within the year, and if out of a group of 900 such men 24 die within the year, 
can the group be regarded as a random sample of such men ? 

6 . At a certain date in a large city 400 out of a random sample of 500 men 
were found to be smokers. After the tax on tobacco had been heavily 
increased another random sample of 600 men in the same city included 400 
smokers. Was the observed decrease m the proportion of smokers significant ? 

7. The average mortality rate for a certain industrial group of 9000 men 
was 11 per thousand. What is the probability that the rate will exceed 12 
per thousand in a given year, if the age distribution of the group remains 
unchanged ? 

8 . In Dullford out of a random sample of 400 householders 30% declared 
that they were regular readers of the Daily Yell. In Sprightley the 
proportion in a sample of 600 was 37%. Do these proportions differ 
significantly ? 

♦ E.g. Quality Control by Statistical Methods , G. Herden (Nelson Sc Son); 
Statistical Quality Control , E. L. Grant (McGraw-Hill). 



306 INTRODUCTION TO STATISTICAL CALCULATIONS 


9. The Registrar General gives the following estimate of children under 5 
at mid-1947 : 

England and Wales. Scotland. 

Males .... 1,813,000 228,000 

Females . . . 1,723,000 221,000 

3,536,000 449,000 

Is the proportion of girls significantly higher m Scotland ? 

(L.U. B.Sc. Econ., 1948) 

10. A life-insurance company had 1000 policies averaging ^2000 on male 
lives at age 25. A mortality table shows that of 85,824 men alive at 25, 
84,816 are alive at 26. Find upper and lower limits for the amount which 
the company would reasonably be expected to pay out during the year on 
these policies. 

11. In a sample of 1000 taken at random from a very large consignment of 
apples landed m England from the “ Icequeen ” 100 were found to be unfit 
for consumption. In a sample of 800 taken from the " Firefly’s ” large 
consignment the proportion was | Is the difference between these pro¬ 
portions significant ? State the 95% limits to the proportion of defective 
apples m each consignment. 

12. Compare the probability of 10 or more correct answers in a test of 16 
questions with the probability of 20 or more correct answers in a test of 32 
questions, if for each question the probability of a correct answer is £. 

13 Net income of 162 white families m Norfolk-Portsmouth, Virginia, 
and 419 white families in Baltimore, Ohio, m 1934-36 : 


Annual net 

No 

of families. 

income ($). 

N orf olk-Portsmou th. 

Baltimore. 

300- 

— 


4 

600- 

10 


45 

900- 

23 


95 

1200 - 

40 


120 

1500- 

32 


67 

1800- 

28 


51 

2100 - 

20 


17 

2400- 

4 


9 

2700- 

2 


5 

3000- 

1 


3 

3300- 

2 


2 

3600- 

— 


1 


162 


419 

Calculate for each town separately : 




(a) The mean income; (b) the standard deviation of the income of a 
single family; ( c ) the standard error of the mean, on the assumption that 
the families included are a random sample of the white families in the 
town. 

Hence calculate the s.e. of the difference of the two mean incomes and 
show that they differ significantly. (R . S . S . Certificate Specimen Paper) . 



SIGNIFICANCE TESTS 


307 

14. In order to ascertain the age distribution of operatives in A certain 
industry random samples of 1720 males and 1230 females were drawn. The 
sample means and standard deviations were 33-93 years and 14-20 years for 
the males and 27-44 years and 10*79 years for the females. Calculate the 
95% fiducial limits (a) for the mean age of all male operatives, ( b ) for the 
mean age of all female operatives. 

15. Compute the first four moments of this distribution : 

Values of x : 10- 20- 30- 40- 50- 60- 70- Total 

Frequency: 1 20 69 108 78 22 2 300 

Does it seem likely that the distribution is normal ? Test whether the 
average is significantly different from 50, assuming a population standard 
deviation of 10. 

(L U. B.Sc. Econ., 1949) 

16. Is it likely that a sample of 300 items, with mean 16*0, is a random 
sample from a large population whose mean is 16-8 and standard deviation 
5-2 ? Calculate the 98% limits of the means of such samples 

17. The standard deviation of a population is 10 oz. What size of sample 
should be taken to obtain an estimate of the mean that shall be m error by 
not more than 0*5 oz., with a probability of not exceeding this error of 0*95 ? 

18. In a random sample of 1000 members the mean is found to be 17*5. 
In another independent sample of 800 members the mean is 18. 

Can the samples have been drawn from the same population with standard 
deviation 2-59 ? 

19. A random sample of 100 potatoes harvested from one field has a mean 
weight of 5 oz. and a standard deviation of 1-5 oz A random sample of 150 
harvested from another field has a mean weight of 4*5 oz. with the same 
standard deviation Discuss whether the mean weight of potatoes harvested 
from the two fields differs significantly. 

(R S S Certificate Specimen Paper) 

20 The mean of a large random sample is x and the standard error of the 
mean is 0*55. What is the probability that the sample mean differs from the 
mean of the population by (i) ± 1*3, (li) — 0*99, (in) -f 1*42, or more? 

21 The mean score for random samples of 140 male and 115 female 
University students in an intelligence test were 102*0 and 99*6 respectively; 
the standard deviations were 12*2 and 10*3. 

(i) Do the data give any reason to doubt the hypothesis that men 
and women students are of equal ability m this sort of test ? 

(n) Is there any evidence of greater variability amongst male than 
amongst female students ? 

22. A random sample of 500 is drawn from a large number of freshly 
minted coins. The mean weight of the coins m the sample is 28*57 gms. 
and the standard deviation is 1*26 gms. What are the limits which have 
a 49 to 1 chance of including the mean weight of all the coins ? 

How large a sample would have to be drawn to make these limits differ by 
only 0*1 gm., assuming that the standard deviation of the whole distribution 
is 1*25 gms ? 

23. A form of intelligence test was given to random samples of soldiers 
and sailors in a certain country. 



308 INTRODUCTION TO STATISTICAL CALCULATIONS 


(i) Is the difference between the mean scores significant ? 

(ii) Is the difference between the standard deviations significant ? 



No. in 

Mean 

Standard 


sample. 

score. 

deviation 

Soldiers 

332 

12-78 

2-43 

Sailors 

615 

12-99 

2-48 


24. Random samples drawn from two counties gave the following data, 
relating to the height of adult males. 

County A. County B. 

Mean height: inches . . 67-42 67-25 

Standard deviation . . 2-58 2-5 

No. in sample . . . 1000 1200 

(i) Is the difference between the means significant ? 

(ii) Is the difference between the standard deviations significant ? 

25. A man buys 100 electric light bulbs of each of two well-known makes, 
taken at random from stock, for testing purposes. He finds that make A 
has a mean life of 1300 hours with a standard deviation of 82 hours and 
make B a mean life of 1248 hours with standard deviation 93 hours Discuss 
the significance of these results. 

26. A random sample of 900 pairs of observations shows a coefficient of 
correlation of 0-35. What are the 95% limits to the correlation in the 
population ? What would these limits be if the sample had had 3600 pairs 
of observations ? 

27. A correlation coefficient of 0-2 is derived from a random sample of 625 
pairs of observations, (l) Is this value of r significant ? (ii) What are the 
99% limits of the coefficient of correlation in the population ? 

28. The variable # is normally distributed in a population, with mean 68' 
and standard deviation 2-5". 

What is the size of the sample whose mean shall not differ from 68' by 
more than 1%, with a probability of exceeding this difference of 

If the difference is to be not more than 0-25%, other conditions remaining 
the same, how large must the sample be ? 

29. Test the significance of the following results by the x 2 test and check 
by using some other test: 

(i) In a test of 10 " yes or no ” questions the subject gives 8 correct 
answers. 

(ii) In a test of 30 " yes or no " questions the subject gives 24 correct 
answers. 

(xii) In a test of 52 questions to each of which there are, on the hypo¬ 
thesis that the answers are pure guesses, four equally likely answers, the 
subject gave (a) 18 correct answers, ( b ) 23 correct answers. 

(iv) A random sample of 1600 drawn from a population containing 
equal numbers of boys and girls is found to include 840 girls. 

30. A jdie is thrown 132 times with the following result: 

No. turned up: 1 2 3 4 5 6 

Frequency: 16 20 25 14 29 28 

Test the hypothesis that the die is unbiased. 



SIGNIFICANCE TESTS 


309 


31. Describe briefly two uses of the x* distribution. How do you de¬ 
termine the number of degrees of freedom of the distribution appropriate to 
a particular test ? 

In the construction of a table of random numbers, 15,000 digits were taken 
from some logarithm tables, and the numbers of each digit obtained were as 
follows: 

Digit: 0123456789 

Frequency: 1493 1441 1461 1552 1494 1454 1613 1491 1482 1519 

Test the hypothesis that each digit had an equal chance of being chosen. 

(R.S.S. Certificate, 1949) 

32. Ten coins were thrown 4096 times with the following result. Are the 
data consistent with the hypothesis that the coins were unbiased ? 

No. of heads :10 9 8 7 6 5 4 3 210 

No. of times: 0 36 169 456 820 980 890 504 191 46 4 


Apply the x * test. 

33. The figures given below are ( a) the theoretical frequencies of a distribu* 
tion and ( b) the frequencies of the normal distribution having the same mean, 
standard deviation and total frequency as in ( a ). 

Apply the x * test of goodness of fit. 

(a) : 1 12 66 220 495 792 924 792 495 220 66 12 1 

(b) : 2 15 66 210 484 799 943 799 484 210 66 15 2 

34. Apply the x 2 test of goodness of fit to the following data, where (a) 

indicates observation and (b) has the same meaning as in Question 33. 

*: 10- 12- 14- 16- 18- 20- 22- 24- 26- 28- 30- 

(a) .* 3 15 48 110 215 250 199 125 60 20 5 

b) : 5 17 52 118 193 233 206 134 63 22 7 £/ — 1050 

35. The following numbers of deaths from cancer of the adrenal gland were 
recorded each month in a period of 24 months : 

3, 4, 2, 1, 3, 4, 3, 2, 1, 6, 3, 5, 4, 2, 2, 0, 4, 3, 2, 6, 0, 4, 2, 0 


Fit a Poisson curve to the distribution and describe how you would test the 
goodness of fit. (L.U. B.Sc. Econ., 1949) 

36. Two distributions of the digits, 0 to 9, are given below : 

Digit: 0123456789 

Distribution A: 6 7 12 9 8 15 11 8 11 13 £/ = 100 

„ B: 18 6 14 7 7 16 7 8 12 5 £/ = 100 

What is the probability, in the case of each distribution, that it is a random 
sample from a large population of digits, in which all the digits are present in 
the same numbers ? 

37. England and Wales. Legitimate Maternities, Mothers Aged 16-19. 

Number of Previous Children by Present Husband 


0 1 2 3 4 Total. 

17,167 1,861 124 2 1 19,155 

Compute the expected frequencies on the assumption that this is a Poisson 
distribution. (L.U. B.Sc. Econ., 1949) 

38. Apply the x l test of goodness of fit to the data of Question 37. 

39. Data of Question 36, Exercise 6. 



310 INTRODUCTION TO STATISTICAL CALCULATIONS 


Carry out a test to see whether there is any evidence of association between 
education and type of work. (R.S.S. Certificate, 1949) 

40. The following data relate to the sales, m a time of trade depression, 
of a certain proprietory article in wide demand. Do the data suggest that 
the sales are significantly affected by the depression ? 



! Districts. 


Districts where 
sales are : 

Not hit by 
depression. 

Hit 

Totals 

Satisfactory 

140 

60 

200 

Not satisfactory 

85 

90 

! 

175 

Totals 

225 

150 

375 

Place of Midday Meal of Earners, Stepney, 1946 

Earners with 
weekly fares : 

At home 

Elsewhere. 

, 

Totals. 

2/9 and under . 

355 

120 

475 

Over 2/9 

19 

85 

104 

Total 

374 

205 

579 


(Source : Unpublished material from a sample survey of Stepney.) 

Assume that the sample is a random one from a large population. Is it 
possible that there was no association between the place of midday meal and 
fares in Stepney ? (L.U. B.Sc. Econ , 1947) 

42. The following table shows the number of recruits taking (1) a pre¬ 
liminary and (2) a final test in car driving. 

Preliminary. 




Pass. 

Fail 

Totals. 


Pass 

605 

135 

740 

Final. 

Fail 

195 

65 

260 


Totals : 

800 

200 

1000 


Use the x 2 test to discuss whether there is any association between the 
results of the preliminary and those of the final test. 

(R.S.S. Certificate Specimen Paper) 

43. A certain type of surgical operation can be performed either with a local 
anaesthetic or with a general anaesthetic. Results are given below. 

Alive. Dead. 

Local . . .511 24 

General . . . 173 21 

Test for any difference in the mortality rates associated with the different 
types of anaesthetic. What qualifications would have to be imposed on any 



SIGNIFICANCE TESTS 


311 


conclusion from such data as to the desirability 6f one or other method of 
anaesthesia ? (R.S.S. Certificate, 1948) , 

44. If the cell frequencies in (a) Question 42, ( b ) Question 43, had been 
stated as percentages, what would be the value of x*> calculated from these 
frequencies, in each case ? 

45. Out of a group of 320 people exposed to infection 255 had not been 
immunized, and of these 95 contracted the disease. Of those who had been 
immunized 15 were infected. Does it seem that treatment gave any protection 
against infection ? 

What is the difference in the significance of the result of the x 2 test according 
as Yates' Correction is, or is not, applied ? 

46. Boot and Shoe Industry. Number of Persons Employed and 

Condition of Premises. (Ministry of Labour Report.) 


Condition 

of 

premises. 

No. of persons employed. 

Total 
no. of 
factories. 

w. 

Under 50. 

(»)• 

51-150. 

(iii). 

151-250. 

(iv). 

Over 250. 

(<*) 

84 

133 

49 

62 

328 

(b) 

87 

82 


25 

214 

to 

26 

9 

— 

1 

36 

Totals : 

197 

224 

69 

88 

578 


[Working Party Report — Boots and Shoes, H.M.S.O.] 


(a) Factories m good premises : ample window space; able to comply 
with Factory Act, 1937. 

(5) Substantial structures : need considerable internal alteration to 
comply with Act. 

( c ) Old and dilapidated : ought to be demolished. 

Does it seem likely that there is no association between number employed 
and condition of factory ? 

47. A random sample of 25 items is drawn from a normal population. 
Given that X — 14-75 and 2(# — x) z = 78, test the assumption that the 
mean of the population is 15-5. 

48. A random sample of 16 out of a very large number of mass-produced 
components gives a mean dimension of 12*05*. The sum of squared deviations 
from the mean of the sample is 0*0960. Is it likely that the mean dimension 
in the population—assuming normal distribution—is 12*00 ? 

Calculate the 98% fiducial limits of the population mean. 

49. The heights of 10 men taken at random from a normal population 
were 63, 64, 65, 67, 67, 68, 69, 69, 70, 72 ms. respectively. Do these data 
support the hypothesis of a population mean of 66*1 ins. ? 

50. The mean of a random sample of 26 items from a normal population is 
42*5 units. The “ sum of squares " from this mean is 250. Is a population 
mean of 45 units likely ? What are the 99% fiducial limits of the population 
mean? 







312 INTRODUCTION TO STATISTICAL CALCULATIONS 


51. A coin is tossed 16 times and the number of heads noted. This experi¬ 
ment is made 10 times. The number of heads shown are 3, 5, 4, 8, 10, 9, 2, 
7, 4, 8, respectively. Is it likely that the coin is biased ? k 

52. An ordinary die is thrown 12 times and the appearance of a 2 or a 3 is 
counted as a success. The experiment is made 12 times. The numbers of 
successes at the 12 trials are 3, 4, 1, 4, 3, 2, 0, 4, 5, 5, 5, 2, respectively. Test 
the assumption that the die is unbiased. 

53. Ten cartons are taken at random from an automatic filling-machine. 
The mean net weight of the 10 cartons is 15-90 oz., and the sum of squares 
is 0*276. Does the sample mean differ significantly from the intended weight 
of 16 oz. ? 

54. To test the desirability of a certain modification in typists’ desks 9 
typists were given two tests of as nearly as possible the same nature, one on 
the desks m use, and the other on the new type. The following differences in 
the number of words typed per minute were recorded. Are they significant ? 

Typist: A. B. C. D. E. F. G. H. I. 

Increased no. of words 

per min. : 2403—14—325 

55. A group of 10 children was tested to find out how many digits each 
child could repeat backwards from memory after hearing them read once 
forward. The group was then given some practice and later a second test 
was given. Is the difference between performance at the two tests significant ? 

Child: A. B. C. D. E. F G. H. I. J. 

Test 1:6547567568 
„ II -.7667677689 

56. Metal discs turned out by a machine have a nominal dimension of 
0*6430 ins. Eight of these discs are taken at random and the dimension of 
each is measured in turn by examiners A and B. 

Measurements. Measurements. 


No. of disc. 

A. 

B. 

No of disc. 

A. 

B. 

1 

- 11 

4 * 4 

5 

0 

- 8 

2 

4 10 

4 15 

6 

- 5 

4 5 

3 

- 8 

0 

7 

4 14 

4 8 

4 

4 2 

4 * 1 

8 

4 5 

4 16 

' The measurements m 

the table represent the difference from 

0*5430 in 

0*0001 in units, ie, — 11 is 0*5419, 4 10 is 0-5440, etc. 

Do the two examiners 

differ significantly in their average results ? 




57. A sleeping draught and a neutral control preparation were tested in 

turn (in random order) i 

on 10 patients at 

a hospital and gave the following 

results: 







No. of hours of sleep. 


No. of hours of sleep. 

Patient. 

Drug. 

Control. 

Patient. 

Drug. 

Control. 

1 

10*6 

8-6 

6 

10*2 

9*0 

2 

7*5 

7*3 

7 

7*1 

6*5 

3 

9-0 

9*4 

8 

9*7 

7*9 

4 

5*4 

5*1 

9 

8*5 

8*7 

5 

6-1 

5*4 

10 

7*9 

6*9 



SIGNIFICANCE TESTS 


318 


Calculate the appropriate value of t to test whether the drug has any 
greater effect than the neutral control preparation, stating the number of 
degrees of freedom and intimating how you would determine the significance 
from a table of the /-distribution. 

(R.S.S. Certificate Specimen Paper) 

58. Supposing that the drug, data of Question 57, had been given to one set 
of 10 patients and the neutral control to another set of 10, with the above 
results, find if the difference between the means of the two sets is significant* 

59. A group of 7 seven-week old chickens reared on a high protein diet 
weigh 12, 15, 11, 16, 14, 14, 16 oz.; a second group of 5 chickens, similarly 
treated except that they receive a low protein diet, weigh 8, 10, 14, 10, 13 oz. 
Calculate the value of / and test whether there is significant evidence that 
additional protein has increased the weight of the chickens. Criticize the 
arrangement of the experiment, and suggest improvements in the design. 

(R.S.S. Certificate, 1947) 

60. A group of 8 psychology students were tested for their ability to 
remember certain material and their scores (number of items remembered) 
were as follows : 

A. B. C. D. E. F. G. H. 

19 14 13 16 19 18 16 17 


They were then given special training purporting to improve memory and 
were re-tested after a month. 

Their scores were then : 

A B. C. D. E. F. G. H. 

26 20 17 21 23 24 21 18 


A control group of 7 students were also tested and re-tested after a month 
but was not given special training. The scores in the two tests were : 

J. K. L. M. N. O. P. 

(1) : 21 19 16 22 18 20 19 

(2) : 21 23 16 24 17 17 16 

(l) Compare the change in each of the two groups by calculating t and 
test whether there is significant evidence to show the value of the special 
training. 

Is there evidence that the experiment was not properly designed ? 

(R.S.S. Certificate, 1949) 

(ii) Is the difference between the mean scores of the two groups,* with 
8 and 7 members respectively, at their first trials significant ? 

61. A random sample of 10 children, aged 14, from a type A school and a 
similar sample from a type B school, were given the test of Question 56. Is 
the difference between the performance of the groups significant ? 

No. of digits repeated. 

Type A: 6547567578 
„B: 7856786879 


62. A random sample of 5 pigs, fed on diet X over a period, showed a 
mean increase in weight of 11*4 lbs. and a sum of squares of 35*2. Another 
random sample of 7 pigs, fed on diet Y over the same period showed a mean 
increase of 14*4 lbs. and a sum of squares of 133*7. Are the means of the 
samples significantly different ? 



314 INTRODUCTION TO STATISTICAL CALCULATIONS 


63. A random sample of 16 men from County A had a mean height of 
68*0 ins. and a sum of squares from the sample mean of 132. A random 
sample of 25 men from County B had a mean height of 66*5 ins. and a sum of 
squares of 165. Can the samples be regarded as random samples from the 
same normal population ? 

Calculate the 95% fiducial limits of the difference between the population 
means. 

64. Seven shops were taken at random in each of two large cities and the 
prices in dollars of a certain commodity noted. 

Price in dollars 

City P: 4*50 4-25 4-20 4-00 3-85 4-70 5-00 

„ Q: 5-25 4-75 4-25 5 00 4-75 4 15 4-95 

Do the sample means differ significantly ? 

65. On a triangulation survey an angle was read 10 times by A and 8 
times by B, under practically identical conditions and with the same theodolite. 
Test the accuracy of the two surveyors. 

A B. 


59° 38' 20-6* 

59° 38' 19-7 

18-5" 

18-6 

20 -8' 

22*8 

22-7* 

21*9 

19-8* 

20*6 

20-4* 

18*4' 

20*5' 

19*0' 

18-7* 

20*5' 

21*4' 


21-3* 



06. A random sample of 18 pairs of observations from a normal population 
gives a correlation coefficient of + 0*52. Is it likely that the variables in 
the population are uncorrelated ? 

67. What is the least value of r in a random sample of 38 pairs that is 
significant (a) at the 0-05 level, (b) at the 0*01 level ? 

68. The scores in two tests given to 22 students were correlated and the 
value of r was found to be 0*42 Is this value of r significant ? 

69. How many pairs of observations must be included m a sample in order 
that an observed correlation coefficient of value 0-42 shall have a calculated 
value of t greater than 2*72 ? 

70. Two groups of students are given an intelligence test (x) and an 
arithmetic test (y ). 

n x — 45 r lzy — 0*45 

n t = 39 r 2xv = 0-38 

Is the difference between the values of r significant ? 

71. A random sample of 30 pairs of observations from a normal population 
shows a correlation coefficient of 0*75. Is this consistent with the assumption 
that the correlation in the population is 0*55 ? 

72. Two independent samples have 28 and 19 pairs of observations with 
correlation coefficients 0*55 and 0*75 respectively. Are these values of r 
consistent with the hypothesis that the samples are drawn from the same 
population ? 



SIGNIFICANCE TESTS 


315 


73. Ia the above example let r, = 0*75 as before. Within wh^t limits 
must r 1 then lie in order that \r l — r t \ shall not be significant at the 0*05 level ? 

74. To assess the significance of possible variations in performance in a 
certain test as between the grammar schools of a city, a common test was 
given to a sample of 8 students taken at random from the senior fifth form of 
each of the four schools concerned. The results are given below. Make an 
analysis of variance of the data. 

School. 


A. 

B. 

C. 

D. 

8 

7 

5 

10 

7 

5 

3 

5 

4 

5 

4 

6 

5 

4 

4 

4 

5 

3 

3 

8 

5 

4 

5 

7 

6 

6 

4 

8 

6 

4 

4 

4 

76. Find the s.e. of the difference of the means of two schools in Question 
74, and make a table showing which differences are significant and at what level. 

76. The following table 

shows yield, 

m lbs. 

per plot. Four different 

treatments were used on 4 plots each. 



Is it possible that the differences in yield may be due to random sampling ? 
Find, and apply, the s.e of the difference of the means of 2 treatments. 


Treatments. 


I. 

II. 

III. 

IV. 

280 

320 

400 

360 

260 

264 

358 

322 

300 

290 

372 

370 

320 

346 

388 

335 


77. Use the data of Question 76 to consider the hypotheses that the 4 
treatments do not differ in their effects on yield and that the 4 plots, i.e. 
rows, do not differ in fertility. Find the s.e. of the difference of the means 
of 2 treatments or 2 plots, and make out a table showing which differences 
are significant and at what level. 

78. The price of a certain commodity was ascertained in each of 4 towns, 
at each of 4 dates, one in each quarter of the year. Prices, m pence, are 
shown in the table. Are the variations between the different localities, and 
between the different seasons, significant ?. 

Towns 


Quarters. 

A. 

B. 

C 

D. 

I 

6 

5 

6 

5 

II 

5 

4 

6*5 

5 

III 

4-5 

3-5 

4*5 

6 

IV 

6*5 

4*5 

6 

7 


79. In Question 78 find the s.e. of the difference of the means of 2 towns, 
and of 2 quarters, and hence show which of the differences (a) between 
towns and ( b) between quarters, are significant. 

80. An experiment was laid down in a field in the Fens to determine the 



316 INTRODUCTION TO STATISTICAL CALCULATIONS 

effect of claying the ground on the yield of barley grain. A control and three 
amounts of claying were used as follows : 

A : No clay 

B : Clay at 100 tons per acre 
C: „ 200 „ „ 

D •* „ 300 „ „ 

and the design used was a Latin Square. The yields were : 


D 

29-1 

B 

18-9 

C 

29-4 

A 

5-7 

C 

16-4 

A 

10*2 

D 

21-2 

B 

191 

A 

5*4 

D 

38-8 

B 

24-0 

C 

370 

B 

24*9 

C 

41-7 

A 

9*5 

D 

28*9 


measured in lb. per plot of 8 J yds. x 8 J yds. 

Carry out an analysis of variance and show how the analysis can answer 
the two questions : 

(а) Does the application of clay have any effect ? 

( б ) Does the use of different amounts of clay have any effect ? 

Give estimates of any treatment effects you find, together with their 
standard errors. (R.S.S. Certificate, 1949) 

81. A patient was treated for a period of 3 days, and leucocyte counts were 
made in his blood immediately before treatment and after 1 , 2 and 3 days. 

Four haemocytometer chambers I, II, III and IV were used, and the 
counts were made by 4 girls, A, B, C and D. 

The following table shows the arrangements of the experiment and the 
counts obtained. 

Chambers. 


Times. 

I. 


II 


III 

IV 


Before treatment 

88 

A 

103 

B 

81 

C 

92 

D 

1 day after 

77 

D 

87 

A 

119 

B 

95 

C 

2 days after 

. 130 

B 

114 

C 

119 

D 

106 

A 

3 ,, ,, 

. 141 

C 

115 

D 

91 

A 

101 

B 


Perform an analysis of variance on these results and comment on the 
conclusions shown. 

How far is residual error variance compatible with the hypothesis that 
other things being equal the distribution of leucocyte counts follows the 
Poisson law ? (R s.S. Certificate, 1948) 

82. A field experiment to test the effect of superphosphate and gafsa rock 
phosphate on potatoes was laid out in a Latin Square with the following 
treatments 

A : No phosphate 

B : 0-33 cwt. P a 0 5 per acre'i 

^ . Q. 0 Q 6 r | as superphosphate 

D: 0-50 „ „ as gafsa 

The area of each plot was of an acre. The plan and yields in lb. per 
plot were as follows : 


C 

312 

A 

232 

B 

298 

D 

300 

D 

305 

B 

305 

A 

248 

C 

324 

A 

256 

D 

317 

C 

330 

B 

312 

B 

343 

C 

343 

D 

320 

A 

271 



SIGNIFICANCE TESTS 


317 


Carry out an analysis of variance of these results and give a tables showing 
the mean yield in tons per acre with each of the 4 treatments with the standard 
error of these means. (R.S.S. Certificate Specimen Paper) 

83. In an experiment to compare roughage (A) alone with limited (B) and 
full (C) grain rations for dairy cattle, 6 cows are used for 18 weeks. Each 
cow receives each ration for a period of 6 weeks and two 3x3 Latin Squares 
are used so as to balance the rations over the 3 periods The yields of milk 
expressed as lb. per cow per 6 weeks are : 

No. of cow 


Period. 1. 

I 1380 (A) 

II 1250 (B) 

III 1150 (C) 


2. 3. 

2090 (B) 2240 (C) 

1860 (C) 1720 (A) 

1390 (A) 1270 (B) 


4. 

1860 (A) 
1760 (C) 
1460 (B) 


5. 

1750 (B) 
1350 (A) 
1340 (C) 


6 . 

2010 (C) 
1630 (B) 
1010 (A) 


The letters m brackets after each yield show the ration. 

Assuming that any effect of ration on milk yield m a subsequent period is 
negligible, make an analysis of variance of these results. Give a table show¬ 
ing mean yields and their standard errors and state any conclusions you draw. 

(R.S S. Certificate, 1947) 

Takew =* (x — 1580)/10 instead of the table yields, to shorten the arithmetic. 

84. Calculate tj x and rj y from the following table and apply a test to discover 
whether either of the regression lines departs significantly from linearity. 



10 - 

20 - 

X 

30- 

40- 

50- 


20 - 

4 

6 

10 

_ 

_ 

Y. 

30- 

2 

5 

9 

4 

— 


40- 

— 

6 

15 

10 

4 


50- 

— 

1 

7 

12 

3 


60- 

— 

— 

5 

8 

2 


85, 



i - 3 

1 

- 2 

- 1 

X 

0 

1 

0 

3 


- 3 

_ 

_ 

1 

6 

15 

_ 

_ 

22 

- 2 

— 

1 

15 

18 

20 

10 

— 

64 * 

- 1 

— 

2 

12 

20 

3 

9 

6 

52 

Y. 0 

1 

10 

2 

— 

— 

4 

5 

21 

1 

~3 

8 

— 

— 

— 

— 

4 

15 

2 

6 

4 

— 

— 

— 

— 

— 

10 

3 

10 

— 

— 

— 

— 

— 

— 

10 

4 

6 

— 

— 

— 

— 

— 

— 

6 


25 

25 

30 

44 

38 

23 

15 

200 


Calculate 17 * and from the above data. Apply the tests for linearity 
of regression, 



318 INTRODUCTION TO STATISTICAL CALCULATIONS 


80. A specified dimension of a machined component is 0*05 in. Five 
components are measured every half-hour, and the following table gives the 
results for 20 consecutive samples, as deviations from the specified value in 
units of 0-0001 in. Make control charts for the mean and the range, marking 
on them appropriate limits. Discuss what evidence the chart gives of un¬ 
stable production and recalculate limits for a part of the series for which the 


range appears to be m 

control. 

Sample No. 





1 . 

2 . 

3. 

4. 

5. 

6 . 

7. 

8 . 

9. 

10 . 

+ 10 

- 11 

- 5 

4- 7 

- 4 

- 1 

+ 8 

4- 11 

4-3 

+ 8 

4 * 17 

- 1 

- 14 

4-4 

4- 1 

+ 3 

4 - 9 

+ 4 

+ 6 

0 

- 1 

4- 10 

+ 3 

4- 5 

+ 6 

4- 12 

+ 2 

4 - 8 

- 4 

4 - 2 

- 1 

+ io 

+ 14 

4 - 4 

+ 2 

- 5 

4 - 3 

+ 13 

- 3 

~ 10 

- 15 

+ 0 

- 5 

+ 1 

+ 7 +2 

Sample No. 

- 4 

4- 11 

- 8 

- 3 

11 . 

12 . 

13. 

14. 

15. 

16. 

17. 

18. 

19. 

20 . 

- 1 

- 4 

+ 1 

- 3 

- 5 

4- 4 

0 

- 4 

- 2 

- 8 

- 8 

+ 7 

+ 4 

- 4 

+ 4 

+ 12 

- 3 

- 15 

- 2 

- 2 

- 1 

- 1 

+ 7 

+ 11 

+ 2 

- 6 

- 5 

- 4 

- 14 

- 4 

0 

- 1 

+ 5 

+ 3 

4- 13 

4- 5 

4 - 8 

- 6 

- 5 

- 2 

- 4 

- 7 

4- 1 

0 

4- 1 

4 - 5 

4-13-4+1 0 

(R.S.S. Certificate, 1947) 


Additional Exercises on Chapters V—X 

1. (i) Show that a 2 — S 2 — x 2 , where S 2 is the mean square deviation from 
arbitrary origin zero. 

(ii) Compute the variance of the following values of x, using the above 
relationship. 

at: 5-6 6-1 7-8 2-5 3-4 7-0 8-5 9-1 4-7 8-3 

2. From the following data compute the mean and standard deviation of 
the four sub-groups combined. 


Sub-group 

Number in 
sub-group 

Mean. 

Standard 

deviation. 

1 

50 

10*0 

2*5 

2 

100 

11*5 

2*8 

3 

150 

10*5 

3*0 

4 

200 

11-0 

3*5 

Total: 

500 




3. 4 ?: 1 0 1*4 1-8 2*2 2-6 3 0 3*4 3-8 4-2 4-6 5*0 5*4 5-8 mid-points, cm. 

/: 1 18 60 125 230 370 450 350 250 115 70 10 1 £/ = 2050 

Calculate (i) the mean and the median, (ii) the standard deviation and the 
mean deviation from the mean of the above distribution. 

4. Calculate (i) /x 8 and and (ii) and /3 8 of the distribution of Question 3. 

5' x: 10- 11- 12- 13- 14- 15- 16- 17- 18- 19- 20- 

/: 5 18 30 54 63 73 61 52 25 12 7 S/= 400 

Calculate (i) the mean, (ii) the mean deviation and the standard deviation 
of the above distribution. 



ADDITIONAL EXERCISES ON CHAPTERS V-X 


m 


6, Calculate y x and y t of the distribution of Question 5. 

Does the distribution appear to be of the normal type ? 

7. Calculate the frequencies of the normal distribution which has the same 
total frequency, mean and standard deviation as the distribution of Question 5, 
for the class-intervals of that distribution. Apply the * 2 -test of goodness of 
fit. 

B. Compute the 95% fiducial limits of the mean and standard deviation of 
the population from which the sample of Question 5 was drawn. 

9. If samples of 25 items are drawn at random from a population with 
mean 15-41 and standard deviation 2-133, within what limits, with a proba¬ 
bility of 499 : 1, will the means of such samples lie ? 

10. The standard error of the median of a large sample of n items, drawn 


from a normal population of variance a i , is 1-253 


Vn 


Fmd the 95% 


fiducial limits of the population median from the data of Question 5. 

11. If random samples of n items are drawn from a finite population of N 
items and variance or 2 , the sampling variance of the means of such samples is 

a ^ n Find the approximate relative error involved in using ° 


n ' N ~ 1 
instead of 


/*L 

V« V N 


Vn 


' as the standard error. 


12. 





Length in cms. 



x : 

18- 

20- 

22- 

24- 

26- 

28- 

30- 

32- 34- 36- 

/: 

2 

2 

5 

13 

26 

52 

104 

162 201 171 

x : 

38- 

40- 

42- 

44- 

46- 

48- 

50-52 


/: 

112 

69 

40 

22 

11 

5 

3 

■S3 

II 


Calculate the values of £ and a of the above distribution. 

13. Calculate the frequencies of the normal distribution with the values of 
N, % and <7 of Question 12, over the same intervals. 

14. Test whether or not the distribution of Question 12 is approximately 
of the normal type (1) by applying the x 1 test, (ii) by computing y x and y s . 


15. Weekly Earnings tn the X Industry, U.S,A. 

Weekly earn¬ 
ings (*) : 10- 15- 20- 25- 30- 35- 40- 45- 50- 55-60 

Sample A: 20 65 150 210 130 49 21 5 — — S/^650 

„ B: — 35 94 189 250 160 80 38 17 7 S/ = 870 

Calculate (1) the mean weekly earnings and (ii) the standard deviation of 
each group. 

16. Is the difference between the mean weekly earnings of the two groups in 
Question 15 significant ? 

17. Calculate the mean and standard deviation of the two groups in Question 
16 combined. What is the standard error of the mean ? Assuming that the 
two groups combined are a random sample drawn from the whole industry, 

\ state the 95% fiducial limits of the mean weekly earnings in this industry. 

) 18. The first four moments of a distribution about arbitrary origin 4 are 

'il, 3-5, 8*5, 33*6 respectively. Calculate (i) ^ and (ii) the first four 

Moments about zero. 



320 INTRODUCTION TO STATISTICAL CALCULATIONS 


19. Distribution of Families A ccording to the Number of Persons in Each 

No. of persons :1 2 3 4 56789 

No. of families: 107 245 228 174 106 65 35 21 19 £/* 1000 
Compute the average number of persons per family and the mean deviation 
from the average. (L U. Inter. Com., 1938) 

Compute also the standard deviation. 

20. Compute the average ages of the mothers for these two distributions : 

England and Wales : Birth Registrations, July-December, 1938 
First Maternities : Interval Since Marriage—2 Years 

Age of 

mother : Under 20 20- 25- 30 - 35- 40— 45 and over Total 

No.: 79 4950 8242 2873 585 90 7 16,826 

First Maternities : Interval Since Marriage—3 Years. 

Age of 

mother: Under 20 20- 25- 30- 35- 40- 45 and over Total 

No. : 7 1636 5766 2377 464 61 2 10,313 

(Inst. T., 1947) 

Compute the standard deviation for each distribution. Compute also the 
quartile and the mean deviations for the first group. 

21. In a distribution which is exactly normal, 9T% of the frequencies are 
under 35 ins. and 5-5% are over 57 ins. Compute the mean and standard 
deviation of the distribution. 

22. What percentage of the frequencies of the distribution of Question 21 
are (i) greater than 60 ms , (n) less than 32 ms , (lii) between 40 ins. and 50 ins. 2 

23. If the total frequency of the distribution of Question 21 is 1000, what 
are the ordinates of the curve at (i) x = 30, (ii) x = 40, (m) # = 45, (iv) x = 57 ? 

24. Find (1) the area under the curve and (ii) the first three moments about 
zero of the frequency distribution y = 6# — x % , where x has a range from 0 
to 6. Find also, (in) the second and third moments about the mean. 

25. Compute the quartile, mean and standard deviations of the frequency 
distribution of Question 24. 

26. Let a distribution be represented by j>(x) = e~ x , from # = 0 to # = -j- oo 
The rth moment about zero *= v r 

r 00 

= I x r e~ z dx = r ! 

when r is a positive integer. 

Thus for this distribution v 0 — area = 1 
v x — mean — 1 
v 2 —- 2 !, etc. 

Calculate ft 2 , and /a 4 . 

27. The distribution defined by <f>(x) = be~ bx , where b is positive and x 
ranges from 0 to + oo, has v r = r ! \b r . Determine the second, third and 
fourth moments about the mean. 

28. (i) Let a frequency distribution be represented by an isosceles triangle 

of height b ; x ranges from — a to ~f a. Calculate the quartile, mean and 
standard deviations of the distribution. . 

(ii) Calculate ft, and for the rectangular distribution <j>(x) « —, 



<n.JL^l^iXJLV^i\r\A^ tx\^x OS, C5 VJ « wxijr*jrijc,xvo V—A. tP<«J. 

29. If the class-in tergal of a frequency distribution is £, or l©ss, of the 
standard deviation calculated from the distribution, show that Sheppard's 
correction to the second moment makes a difference of £%, or less, in the 
estimate of standard deviation. What is the error if the class-interval is 
or less, of the calculated standard deviation ? 

30. The rth factorial moment of a distribution about the origin, x = 0, is 

y (r) =*= where x (r) = x(x — 1) . , (x — r + 1) 

Show that v 2 — v (2 ) 4- % 

= V( 3 ) 4- 3i> ( 2) -f x 
v 4 = v (4} + 6v (s) -f 7v (a) -f £ 

31. An urn contains 12 black and 8 white balls. Eight balls are drawn 
at random, without replacement. Calculate the probability of (i) 5 and (ii) 3 
white balls. 

32. Three regular pentagonal prisms, with rectangular faces marked 1, 2, 
3, 4, 5 respectively, are thrown 

(i) Determine the theoretical frequencies of sums of spots, 3 to 15, on 
the undermost faces 

(ii) What are the theoretical mean and standard deviation when n 
such prisms are thrown ? 

33. A sample of 800 soldiers had a mean weight of 145 lb. Could it be a 
random sample from a population with mean 148 lb. and standard deviation 
162 lb. ? 

34. A random sample of 800 soldiers had an average weight of 145 lb. with 
a standard deviation of 15*8 lb. A random sample of 800 naval ratings had 
an average weight of 150 lb. and a standard deviation of 16-3 lb. 

(i) Is the difference between the means significant ? 

(n) Do the data suggest that sailors are more variable than soldiers in 
respect of weight ? 

35. Compute the 98% fiducial limits (a) of the mean of the population of 
soldiers and (b) of the mean of the population of sailors, from the data of the 
previous question. 

36. What size of sample of soldiers would have to be taken if the difference 
between the 98% limits were to be not more than £ lb. ? 

37. Nine hundred working-class families were selected at random m each of 
two districts. It was found that 9% and 11% of the families respectively were 
overcrowded. Were the districts significantly different in respect of over¬ 
crowding ? 

38. A random sample of 300 bricks from kiln A had an average transverse 
strength of 991 lb. per sq. in. and a standard deviation of 225 lb. per sq. in. 

The corresponding figures for a random sample of 270 bricks from kiln B 
were 1000 lb. and 202 lb. per sq. in. Is (i) the difference between the means, 
(ii) the difference between the standard deviations, significant ? 

39. It is expected that 25 out of 1000 certain mass-produced articles, on the 
average, will be defective. What is the probability of getting 4 or more 
defectives in a random sample of 100 items, when the process is under control ? 

40. x : 20 25 30 35 40 45 50 56 

y : 240 315 403 450 488 520 525 532 

L 



322 INTRODUCTION TO STATISTICAL CALCULATIONS 


Fit, by the method of least squares, 

(i) the straight line of best fit; 

(ii) the 2nd degree parabola of best fit, to the given data. 

41. x : 0 5 10 15 20 25 30 

y\ 195 157 178 250 405 600 880 

Fit a 2nd-degree parabola to the above data by the method of least squares. 

42. Fit a 2nd degree parabola to the following data : 

Mean length (in.) : 7-5 10 0 12*5 15-0 17-5 20-0 22-5 

Mean weight (oz) : 1-9 4-5 10-1 17-6 27-8 40-8 56-9 

43. Fit a curve of the type y = ab x to the following data : 

x: 1 2 3 4 5 6 7 

y: 10 12-2 14-5 17-3 21 0 25 0 29-9 

44. Calculate the coefficient of correlation between the following series : 

* : 47 45 44 35 34 37 31 29 28 28 

y: 41 43 43 50 52 45 50 57 56 56 

45. n pairs of values of two variables # and y are given. The variances of 
x, y and (x — y) are a x 2 , a v 2 and a {x _ v) s respectively. Show that the correlation 

coefficient r m is given by ——&J 

Z<J X Uy 

46. n pairs of values of two variables a and b are given and each variable is 
ranked in order (1 to n). Show that the coefficient of correlation between 

ranks is given by r = 1- r _^._ where d is the difference between the 

b J n(n 2 — 1)' 

ranks of a and b. Obtain r, given the following values : 

a: 7-4 9-0 11 0 2-5 4-6 0-5 

b : 8*5 6-1 2-4 6*7 12*6 3-3 

(L.U. B.Sc. Econ , 1950) 

47. From the following data for the United States calculate the product- 
sum correlation between the variables and comment on the result: 



Percentage of 

Typhoid fever 


population using 

death-rate per 


filtered water. 

100,000 living. 

1900 

8*7 

36 

1901 

10*8 

34 

1902 

11*9 

37 

1903 

13*3 

38 

1904 

16*0 

35 

1905 

17*4 

30 

1906 

20*5 

33 

1907 

23*2 

32 

1908 

23*3 

25 

1909 

30*1 

21 

1910 

34*6 

24 

1911 

37*2 

20 

1912 

42*4 

16 

1913 

48*0 

16 


The standard deviations may be taken as 12-0 and 7-55 respectively. 

(L.U. B.Sc. Econ., 1932) 



additional exercises on chapters v-x 323 

48. Data of Question 44 of this Exercise. Calculate the' regression 
equations. Calculate the standard error of estimate of y from the regression 
equation and check by formula. 

49. Soil temperature and germination interval (interval between sowing 
and appearance above ground) for winter wheat, 1926-7 in 12 places/ 


Mean soil 


Mean soil 


temperature 

Interval 

temperature 

Interval 

at 4 in. (° F.). 

(days). 

at 4 m. (° F.). 

(days). 

57 

10 

44 

19 

42 

26 

40 

18 

38 

41 

46 

19 

42 

29 

44 

31 

45 

27 

43 

29 

42 

27 

40 

33 


Calculate to two decimal places the coefficient of correlation between soil 
temperature and germination interval and interpret the result. 

(R.S.S. Certificate, 1949) 


50. 


y \ 

22- 

24- 

26- 

28- 

30- 

32- 

34- 

36- 

fr 

44- 

1 

3 

1 

_ 

— 

_ 

_ 

_ 

5 

50- 

1 

4 

3 

2 

— 

— 

— 

— 

10 

56- 

— 

1 

3 

5 

2 

— 

— 

— 

11 

62- 

— 

— 

2 

6 

3 

1 

2 

— 

14 

68- 


— 

; 1 

6 

5 

2 

3 

1 

18 

74- 

— 

— 

— 

1 

4 

5 

4 

2 

16 

80— 

— 1 

— 

— 

— 

1 

2 

1 

2 

6 

/.: 

2 

8 

10 j 

20 

15 

10 

10 

5 

80 


Calculate r xy from the above data 

51. Calculate the regression equations—data Question 50. 

52. Calculate o x ~ v and <r x+v and check the value of r^ found m Question 50. 
53 r l%*A‘ulate y v and i?* and test for linearity of regression in Question 50. 
54. The average score m a class of boys in an algebra examination (X) was 

53*2 with a standard deviation of 16*3. In a geometry examination (Y) the 
average score of the same class was 44T, with a standard deviation of 18*7. 
\ The correlation coefficient between X and Y was found to be 0*68. 

Calculate the regression coefficients and write down the regression equations, 
Y on X and X on Y. 

1 55. A biased coin is thrown ten times; p = chance of obtaining a head at 
a single throw — Calculate, correct to 4 places of decimals, the probability 
of obtaining four or more heads in the ten throws. 

56. A rock-climb has four pitches. The odds in favour of a climber sur- 
, mounting these four pitches are 5 to 2, 3 to 4, 3 to 5 and 2 to 7 respectively. 



324 INTRODUCTION TO STATISTICAL CALCULATIONS 

What are the probabilities that he will (i) accomplish the whole climb, (ii) fail 
at the fourth pitch, (iii) fail at the third pitch ? 

57. A boy is marked for English, French and Science, with a maximum of 
6 marks for each subject and a minimum of zero. In how many ways can he 
get a total of (i) 6, (ii) 9, (ni) 11 ? 

58. Four regular tetrahedral dice are thrown. The faces of each die are 
marked 1, 2, 3, 4 respectively. Let v be the sum of the numbers on the 
undermost faces at any throw. State the generating function of the probability 
of any sum r; r = 4, 5, . . . 15, 16 

59. Data as for Question 58, excepting that the numbers are 0, 1, 2, 3. 
What is the generating function of the probability of any sum r , r = 0, 
1 , . . . 11 , 12 ? 

60. A man throws a four-sided die in the form of a regular tetrahedron, 
with its four faces marked 1, 2, 3, 4, respectively He is to receive 10/- if 
he throws a 4 at his first throw, 5/- if he throws a 4 at his second throw, 
2/6 if he throws a 4 at his third throw, and so on What is the value of his 
expectation ? 

61. An ordinary cubical die is thrown twice and the product of the two 
numbers turned up is noted How many different products are possible and 
which are the most likely ? If the die is thrown three times, what is the 
probability of a product of (i) 12, (n) 24, (m) 36 7 

62. Ten cards, marked 1 to 10 respectively, are placed in a box and three 

are drawn at random and their sum noted (i) What is the probability that 
their sum is 14 ? (n) What is the probability that their sum is 17 or 18 ? 

63. A bag contains 10 blue and 5 green counters; a second bag contains 
6 blue and 4 green counters. A blue counter is worth 2/6 and a green counter 
1 /-. A counter is drawn from each bag at random and, after both have been 
drawn, placed in the other bag What is then a fair price for the first bag ? 

64. A bag contains x white, y black and z green balls. Three balls are 
drawn at random. What is the chance that they are all of different colours ? 

65. A tosses 1 die, B tosses 2 dice. They toss alternately, A having first 
toss, and the first to toss 6 (i.e., a total of 6 m B’s case) wins a prize of *£3 Is. 
What is A's expectation ? 

66. Data of Question 65 What is A’s expectation if B has the first toss ? 

67. In a certain trade 500 out of a sample of 3000 men of age 30 died before 
they were 55. In another trade 400 out of a sample of 2800 men of age 30 
died before they were 55. Is the difference between the proportions significant ? 

68. Of 82 flowering plants 68 had coloured and 14 had white flowers. The 
expectation was J coloured and i white. Is the difference from expectation 
significant ? 

69. A difference of # or more from the expected number in Question 68 is 


significant at the 0*005 level. 

What is x ? 



70. 

No. m sample. 

Coloured. 

White. 

Sample I 

76 

64 

12 

„ II . 

120 

99 

21 


The expectation is | coloured and £ white. Test the significance of the 
difference from expectation for each sample separately and for the two 
samples combined, 



ADDITIONAL EXERCISES ON CHAPTERS V-X 326 

71. Test the assumption that the attributes, P and Q, are not associated in 
the population from which the following random sample was drawn. ■ 



P. 

Not P. 

Total. 

Q 

43 

19 

62 

Not Q 

17 

21 

38 

Total: 

60 

40 

100 


72. The expectation in a certain experiment was J as the proportions 
of C, R and S respectively. The actual result was 18, 44, 38 respectively; 
can it be regarded as supporting the expectation ? 

73. A sample of 36 boys had a mean I.Q. of 97*2. Is it likely that the 
sample was drawn at random from a normal population with mean 100 and 
standard deviation 16 ? 

74. A random sample of 10 boys had the following I.Q.'s : 

70, 120, 110, 101, 88, 83, 95, 98, 107, 100. 

Do these data support the assumption of a population mean I Q. of 100 ? 

75. Find the 95% fiducial limits of the population mean corresponding to 
the data of Question 74. 

76. Another sample of 10 boys had the following I.Q.’s : 

65, 70, 75, 80, 92, 87, 85, 92, 93, 88. 

Can this be a random sample from the same normal population as that 
from which the sample of Question 74 was drawn ? 

77. Breaking Strain of 0*105 m. Copper Wire (lb per sq . in ) 


Sample I. 

Sample II 

576 

572 

570 

571 

568 

569 

565 

563 

570 

565 

567 

570 

567 

560 

573 

580 

595 

560 

585 

574 


Is the difference between the means of the two samples significant ? 

78. A coefficient of correlation of 0*3 is obtained from a random sample of 
27 pairs of observations from a normal population. Show that this value of 
r is not significant. 

79. Find the least value of r, data of Question 78, which is significant (i) at 
the 0*05 level, (ii) at the 0*02 level 

80. The value of r obtained from a random sample of 19 pairs of observations 
from a normal population is 0*8. Is this value consistent with the hypothesis 
that the correlation in the population is 0*6 ? 

81. A correlation table shows 84 pairs of observations, with 8 arrays of 
y's. The value of rj v calculated from the data is 0*45. Is this value significant 
of correlation between the variables in the population ? 



326 INTRODUCTION TO STATISTICAL CALCULATIONS 

82. The value of r calculated from the data of Question 81 is 0*28. Test 
the hypothesis that the regression is linear. 


c 22 

* 18 

» 22 

b 20 

a 11 

B 14 

a 12 

» 12 

» 28 

B 21 

v 20 

c 18 

*> 14 

® 22 

» 15 

A 14 

o 20 

f 21 

E 30 

a 15 

f 20 

C 18 

B 18 

16 

F 25 

» 16 

<■' 23 

J> 16 

e 27 

a 10 

» 20 

25 

a 18 

f 22 

i> 17 

e 23 


The figures in the above table are yields m bushels per acre of a variety of 
wheat. The letters represent six different manurial treatments Make an 
analysis of variance. 

84. Calculate the standard error of the total yield of a treatment, and the 
standard error of the difference between the means of two treatments Make 
a table showing the levels at which the chffeiences between the various means 
of treatments are significant. 



ANSWERS TO EXERCISES 


Exercise 1 


1. 

7-4, 7-4 

2. 5*85, 

6. 

3. 

1*85, 2, 1 

4. 

3-54, 3-53. 

5. 

3-78, 3, 3. 

6. 55. 


7. 

24 m.p.h. 

8. 

18-1%. 

9. 

3*83%, i.c, H.M. of 3, 

4, 5 

10. 

/3 1. 8. 



11. 

13*53, 13*50, 

12-50, 14- 

53. 

12. 

29-47, 29-23, 23-53, 35-25. 

14. 

52*36, 63*36, 

72-22. 


15. 

1*2%. 



16. 

6*4, 4, 16. 



17. 

67*7, 65*4, 

69*4. 


18. 

29*0, 26*2 : take upper 

Limit 65 

19. 

35*9, 34*4, 

29-5, 41-23. 

20. 

VfO-0 x 0-75 x 0 9 x 

~M~x0 

•9) - 

0*80; annual rate — 20%. 

21. 

+ 1-8%. 



22. 

B, 16*5%; 

C, 11*25%; all, 11* 

23. 

1939, 9-75%; 

; 1940, 9*41%. 

24. 

About 9 d 



25. 

About 7 id. and 10 id., ! 

9-12 d 





26. 

(i) 30 to 31 dollars, ( 11 ) 30*14, (in) 30*16, 

(iv) 28*89 to 31*26, (v) 40%, (vi): 


(v«) 9%. 







27. 

47-67, 47-62, 

38*25, 57*1 

*9, 47*5 





28. 

5*05, 5*14, 4*45, 5*77. 


29. 

3i 3*173, ! 

1*124 


30. 

16*65, 15*69, 

17*55. 


32. 

58*72 

33, 

40*15. 

35. 

24-1%. Heights of cols. 1 2", 3 

0*, 5* 

V, 4 2", 3 3 

", 1*75", 

, 0 5", 0*175' 

36. 

(a) 2421/717, 

, (b) 2148/717. 






W- (;)* 

(*)• 

m- 

(«*)• 

(w) 

(o) 

(/>)• 


42 

97 

0*43 

— 

— 

— 

— 


388 

559 

0*69 

— 

—. 

2 

— 


552 

550 

1*00 

— 

— 

— 

258 


548 

441 

1*24 

— 

—. 

236 

128 


465 

275 

1-69 

— 

225 

120 

60 


276 

153 

1*80 

— 

96 

66 

96 


126 

60 

2*10 

49 

21 

28 

28 


24 

13 

1*85 

— 

— 

24 

— 


2421 

2148 

M3 

49 

342 

476 

570 


37. Assuming a range of 10/- to 50/- mean wage is about 34*43/- per week. 

38. 8*15, 8*00, 7*845. 

42 . (1) g t — 2, the mean; (2) x i ~ 1-7573, the median. 

44 . 1942 : 192-2 mn. 1-06 tons; 1943 : 188-0 mn. 1-03 tons. 

46 . 26-606 grams. 

47 . (i) 2-04, 1-80, 13-19, 11-58; (ii) 6-48, 0-42; (iii) 77-7, 80-8. 

48 . (i) 3-3, 3-6, 3-9, 4-2, 4-0; limits l'-V; m.p. 1-5, 2-5, etc. 

(ii) 96-9, 82-3, 76-1, 49-7, 80-2. 

49. (i) 60-32, 49, 39, 61; (ii) 50-2, 49-3, 39-8, 62-0. 

60 . (a) 18-7, (6) 1-16. 

61 . Assuming mid-interval 998-5-999-5 is 990, and so on, mean is 1000-1 c.c. 
See page 23 for note on class intervals and mid-points. 

327 



328 INTRODUCTION TO STATISTICAL CALCULATIONS 


Exercise 2 

3. 27/8$. Yes, with suitable alterations in the amounts sold. 

s. d. '000. 

27 6 48 

28 9 1 

29 3 10 

27 3 20 Average = 27/8 

79 

4 . 4. 8. 2. 6. 3-473, 3-315 7. 126, 110 8. 125-4, 109-6. 

9 . Compare the results of taking several sets of approximate weights. 

10 . 16 per 1000. 11 . 63 3, 62-4; by 3 marks. 

12 . Approx. 5-65, 6-78, 8-47, 6-97. 

18 . (i) 2/10$, (ii) 3/2, (in) 2/7}, (iv) 2/10-7. 14 . 120-56, 108-71. 

15 . 13-05/-, 12-7635/-. 16 . 1-264 17 . 9-15%, 9-75%, 9-10%. 

18 . C.R. = 13-18, SR = 13-46; say 13-2 and 13-5. 

19 . C.R. = 13-32, S R. = 9-03. 

20 . + 24-8, - 11-4, - 5-7, - 7-7. 23. + 6-1, - 2-2, - 5-5, 1- 1-6. 

Corrected figures for 1937, 87-7, 94-5, 92-0, 92-1. 

24 . (a) + 3-6, + 17,- K-6, + 3-3. ( b ) 0-935, 0 972, 1-180, 0-932 

25 . + 19-5, - 3-7, -14-6, - 1-2. Corrected figures for 1931, 152-5, 149-3, 

152-4, 148 8. 

26 . - 4-7, - 7-5, - 8-3, + 20 5 

27. 1931 total value at 1930 prices would have been — 1065-43. 

U*ol 

Volume index for 1931, 1930 — 100, is x 100 =- 102. 

1045 

Volume, or quantity indices lor 1932, . . . are 88-6, 90-1, 95 2, etc. See 
Chapter III. 

28 . 88-0, 96-4, 118 0, 97-0. 29. 98 1, 96-1, 95 6, 110*2. 

Exercise 3 

1 . 36*5. 2 . 10. 3. 12 4 . 43-75, 20. 

5 . (a) 100, 110-0, 123-2, 136-0, 126-2 

(b) 100, 110-0, 113-9, 110-2, 93-3, based on previous year 
100, 110-0, 125-3, 138-1, 128-8, based on 1934— chained. 

6. 100, 114-7, 131*3. 7. 96 2, 104-0. Apply reversal test. 

8 . 111*3, 110*4 9. (a) 114-3, 125*6, 153*7; ( b) 108*9, 139*5. 

10 . 1915 , 103*4; 1916, 137*6, with base June 1914. 

1915, 104-8; 1916, 139-4, with base annual average of 1914. 

18 . 194-8, 205*6. 13. Q 01 - 85. 15 . I - 24-8, Ij - 402*3 

16 . 126, 170 using weights 3, 1, 1; 128, 172 using weights 1948, 731, 712. 

17 . 100*7, with weights 123, 51, 173, 12 

18 . 69*26, 144*4. 19 . 69-9, 145-8. 20 . 256. 

21 . 100, 108-6, 122*4, 134*8, 125-2. 22. 87*8, 85-6, 85*6, 88*0, 88*9 (1931-35) 


23 . 

s. 


s . 


425,000 4-77 

293,000 

3*6 


3,376,000 2-93 

834,000 

1*77 

24 . 

3,801,000 3-14 

1933, 106-5; 1936, 140-6. 

1,127,000 

2*25 



ANSWERS TO. EXERCISES 


329 


25 . Quantity: 1946 on 1988 = 129-8. Price: 1946 on 1938 ~ 118-0. 

1938 „ 1946 = 76-3. 1938 „ 1946 = 83-9. 

26 . (a) 1935, 111-7; 1936, 122-0, etc. (6) Same as (a). 27 . 197-6, 


1930 

I.N. average values. 
100 

I.N. volume. 
100 

1931 

81 

102 

1932 

75 

90 

1933 

71 

91 

1934 

74 

95 

1935 

75 

97 

1936 

79 

103 


80. Net output (£’000) : 5047 5862 7405 

Per person employed (£) : 285 155 189 

31. P ol = 90-4. P 10 = 112-2. P 01 ' = 89-2, Q 01 ' = 96 6, V 01 = 87-3. 

Qoi = 98-0. Qio = 103-6. Apply check Q 01 x P 01 ' = V 01 . 

33. (l) 1938, 115-3; 1944, 283-2; 1945, 268-9. (if) Check by reversal test. 

34. (l) 118-5, 122-1; (ii) 113, 124. 35. 103-4. 36. 110-75, 123-94. 

37. P 01 = 269, Q 01 = 178, P 10 = 37-3, Q 10 = 56-3. 

38. (l) P 01 ' = 221-6, Q 01 = 225-7; (n) P ol ' = 358-8, Q 01 = 99-0. 

39. U.K 104-4, 117-2, 114-8, 127-6, etc 

Exercise 4 

1. -f 500 m, ;t 500- 2. 3191 ±10; 2 significant figures, i.e , 3200. 

3. Say 1,668,500 ± 1%. 

4. Max. absolute error is < ± LW(r + y) ; max. relative error is< (x + y). 

5. (a) 70 ~ 74, (6) 302 ~ 338, (c) 777 ~ 665, ( d) 2870 ~ 3550. 

6 . Roughly $1,000,000 ± 15% 7. 1-465 ± 0-012 

8. Between 365,800 and 364,400. 

9. 1939, (85-0 ±0-1)%. 

1940, (1013-6 ± 1-2) mn 

1941, {1336-25 ± 2-05) mn. 

1942, (4773-73 -± 7-02) mn , etc 

10. A = B/C ± 5i% 

11. Max % error ~ 3% ; max absolute error 0-03.v 1 .v„.r 3 . 

12. (71-65 ± 1-25)% 

18. (a) 1921, 530.868 ± 6826; 1931, 443,700 ± 7400. % decrease lies 
between 18-9 and 13-9. 

(b) 1921, 721,436 ± 6832; 1931, 695,)30 ± 7418. % decrease lies 
between 5-6 and 1-7. 

14. Deposits, say £(4319 ± 16) mn.; Cash, say £(475 ± 4) mn. 

15. 260-28 ± 0-61, 220-3 ± 0-55 mn. tons ; (15-4 ± 0-4)% decrease. 

16. Maximum 41,145,000; minimum 40,733,000. 

17. A (13,572 ± 37) 10 s , B (7016 ± 27) 10 s , C (22,872 ± 48)10*. Total 

(43,460 ± 112)10*. 

18. Census figure for 1911 was 42,082,000. More accurately the % increases 

were 4-62% ( 1911-21 ) and 4-57% (1921-31). 

19. Say (7977 ± 6)10*. 20. 3-59% and 1-78%. 

B 

81. Max. error in 1000p=n= (x + y)%. 



580 INTRODUCTION TO STATISTICAL CALCULATIONS 

22. Compare your answers with the following official figures : 

No. employed (*000). 

June 1939. April 1946. 

Agriculture . . . . 910 1009 

Transport and Shipping . . 1273 1388 

Mining. 873 790 

28 . (a) % males =o= 45*9; (6) 45-9. 

24 . Between 4,634,000 and 4,590,000 25. Roughly 3-19 ± 10%. 

27 . Wheat (33,554 ± 893) 10 s , oats (38,832 ± 1221) 10 3 ; all (88,502 ± 26 1 2) 10 3 . 
Note that the absolute error of all grams is fixed, viz., dr 2612 x 10 3 , and 

hence the method by difference is clearly wrong. 

28 . % decrease = 100 - 100(l - j£ 0 )(l - jgjj)/(l - Wo) 

If x, y and z are small, % decrease 

Additional Exercises Ch. I—IV 

1. (i) 34-7, 17*8, 51-8. 

8. Assuming mid-intervals 32/6, 37/6, etc , x = 45/1. 

Quintiles : 40*85, 42*90, 44*95, 49*25 shillings 
4. 15*02, 14*84 ins 5. 73/6±, 92/11 6. 3*6, 10*8, 5*4 m.p.h. 


K Rows: 8 

74 

183 

188 

215 

306 

42 

13 

100 

186 

200 

233 

318 

44 

Columns: (j). 

(k) 

(j). 

(m) 

<«). 

(o). 

(/>)■ 

10 

19 

0*53 

-— 

— 

— 


58 

75 

0*77 

__ 

— 

6 

— 

389 

181 

1*04 

— 

3 

_ 

60 

148 

135 

1*09 


— 

12 

48 

235 

228 

103 


5 

15 

40 

240 

223 

1 08 

- 

— 

6 

72 

105 

88 

1*19 


— 

7 

84 

24 

17 

1*41 


— 

8 

16 

45 

30 

1*50 


9 

_ 

36 

40 

20 

2-00 


30 

30 

— 

J094 

1036 

1*08 


27 

84 

356 

1, Assume a range of 55/- 

-105/- 


* (sh). 

/(%). 




55 and under 60 

4 




60 


62/6 

11 




62/6 


72/9 

10 


Assuming mid-intervals as 

72/9 


78/9 

15 


57/6, 61/3, etc., the mean is 

78/9 


82/3 

10 


approximately 80/11. 


82/3 


85/4 

10 




85/4 


90/6 

95/- 

15 




90/6 


10 




95/- 


100/- 

10 




100/- 


105/- 

5 



9. (a) 33*93; (b) 31*72,44*74,21*49. 


100 






ANSWERS TO EXERCISES 331 

10, Rises are equivalent to annual rise, based on salary of previous year, of 

24-9%. 

11. 4. 12. 2. 13. <*+»>V*° X y*. 14. 5. 

15. H.M. of rates 9*6 m.p h. 

16. Taking mid-intervals as 5, 15, etc., turnover is £21,550, average 

value — £28*733, median = £26-65. 

17. G.M. = 8-002, H.M. = 7-845 

18. (1) i(x -f y + z) ; (2) \(x -f 2y + 4 z) ; (3) 3 xyzj{xy + xz -f yz) per cent, 

19. Mean =o= 327, median =o= 155 

21, 46*46. 22. 46*92. 23. 46, 46-4. 

24. 47-90, 47-91. 25. 51-92, 43-86. 

26. (a) 149, 76; (c) 22-86, 11-60, 7-72, 2-94, 1-28, 0-070, etc. 

27. 9-3 days nearly. 28. 8-87 days; 1810. 29. 15-57 mn. 

30. (i) £570, £155; (ii) £4350, £4850. 

31. No. of manshifts Average output per 

' worked (million). manshift (tons). 


J 13-52 1-06 

F 13*77 1-06 

M 17-68 1-05 

A 13-60 1-03 

M 13-88 1-08 

J 16-20 1-08 


32. Calculate median and quartile ages for each year. Obtain numbers 

under 20, over 60, over 65, e g , from ail ogive for each year. 

33. Using data as given, about 24-8% and 4-4%. 

84. ^non-e&rnere - 1*58, Earners - 1*78. (a) 47*2%, (b) 25*2%. 

35. Assuming range of 10-44 and intervals 9-5-14-5, 14*5-19-5, etc., means 

are approximately 19-8 and 21-1 cwt. 

36. Incomes : assuming range 20/—179/11, m p 29/11J, 49/11J, etc., mean — 

66/2-4, median = 56/3-7. 

Fuel: assuming range 2/—9/11, m p. 2/5J, 3/5J, etc., mean = 6/OJ, 
median = 5/11. 

37. Using data as given proportions are 0-077 (i.e., 7-7%), 0*181, 0-802, 

40. 36-99/-, 7*55/-, 9-80/-. 

41. (i) 12-5, 12*9, 13*5, 12-3, 11-6, 11-8%; (ii) 3300. 

42. 1938, £431-6 mn., £248*9 mn., £233-3 mn.; 1949, £968*3 mn , £773-0 mn,, 

£508-7 mn. 

43. (i) 7-42, 8-87; (ii) 9-26. 

44. (l) 70-6/-, 70/-, 74-1/-, 75/-; (n) 72-33/-, 75/-. 

45. 122-394 mn., 138-362 mn. 

46. (a) Deviations : 0*6, -f 1*6, — 0-6, — 1-8, 0, -f 0-8, -j- 1-6, — 0-6, 

- 2-0, - 0-4, 4* 1*2, -f 1-8. 

47. Average quarterly deviations : -f 15*5, — 6*5, — 13-3, -f 4-3. 

48. (2) (c) + 31-4, - 14-9, - 5*6, - 10*9; ( d) 1-598, 0-782, 0-937, 0-854. 

49. Daily corrections: -f 5-32, — 2-14, -r- 1*21, — 1-35, — 0-53, — 2*26, 4* 2-14. 
51. Corrections: 4- 26-0, - 6-8, 4- 7*4, - 26-6. 

53. A.M.S. 236*1, 42*84; G.M.s. 234*7, 42*6. 

54. 136*2, 139*1. * 55. 78*55. 

56. (i) 100, 101*8, 109*8, 125*8, 130 0; (ii) 100, 101-8, 115*4, 132-2, 137*8. 



338 INTRODUCTION TO STATISTICAL CALCULATIONS 


#7. 80-8, 82-6, 87-4, 100, 104-2. 



Average value. 

Volume. 

1948: I , 

282 

74 

II . 

331 

89 

Ill . 

276 

80 

Total 

295 

79 

1949: I 

285 

80 

II ♦ 

339 

100 

III . 

277 

84 

Total 

299 

86 


59. 140-5 (use weighted aggregate of prices method). 

60. (i) 116-5, 85-8, 114-8, 87-1; (n) 116-5, 114-8, (in) (a) 116-5, ( b ) 116-5. 

61. P 01 ' = 87-7, P os ' - 87-9, Q 01 - 144-7, y 0 » = 227-9. 

62. £369,800, £177,700; 94 9, 97-2. 63. £299,900, £551,100; 135-1, 248-2. 

65. £63,900, £110,900; £60,200, £95,600 66. Total 143,924 ± 190. 

67. 6,967 ± 22; 2,047 + 11, 3,705j± 13; 3,903 ± 14|, 14,328 ± 41; 4,749 ± 18 

Say 35,700 ± 120 

68. 288. 

Exercise 5 


1 . 

3. 

6 . 

7. 

n. 

17. 

10 . 

23. 

24. 
26. 
28. 
30. 
32. 

35 . 

36. 

37. 

38. 

39. 


<x = V14. 2. [a) 2Vli, ( b) 2V14. 

2-21, 2-77; 6-86, 8-30, 2 90, 3-60 4. 1-79, 1-77, 1-31, 0-67, 1-84. 

10-8,28-4,4-16 6. 1-43,1-93; 1-50,1-91; 10-83,13-24. 


\J — 1 9.13-31,15-94. 10.44-5, 2-10. 11. 30-0,10-0 

5, Vtt. 13. i 4V20 15. 74/-, 12-18/- 16. = 16, <r, = 4. 

a = V3-2; M.D. = 1-5 18. a of first n even nos — 


1 


21. Equitable 01 der is C, B, A 
*)* = 2500, X.(x l - a,) s 1012-5, 


= 8, <t 2 = Vl-5. 

x t = 59, --= 27-75 £(;r 

S(* 2 - x t y = 1387-5. 

396, 46. 25. 1159, 70 

156-7,154-7; 12-94, 21-3. 27. 32-02, 13-18, 8-7. 

3-40,4-39,5-60; [x — M„)/ct = 0-64. 29.1-68,2-07. 

(i) 4-324, 1-89; ( 11 ) 4-016, 1-52, (m) 1-55, 1-28. 81 . 14 . 21 . 

14-22, 12-07. 33. MD. = 4-46, a = 5-36. 34. 0-924, 1-21; 1-007, 1-27 

Uncarpeted : 124 sq. ft, 45 sq. ft. 

Carpeted: 120 „ 40 „ 

(i) 33-93, 31-72 for males; 27-45, 24-66 for females, (ii) 14-21, 11-63 for 
males; 10-79, 7-93 for females. 

Girth : 8 = 35-71 ms., a = 2-04 ins. Coef. of variation = 5-71. 

Height; 8 = 67*49 ins., <j = 2-64 ms. ,, „ 3 . 91 . 

169, 87; assume range 26 ~ 2999. 

N.W. Region ; 8 = 483-5 yds Md. = 458, Q.D. = 148 s d = 209 
sk. = 3(483-5 ~ 458)/ 209 . 

H. Midland : 8 = 382 yds. Md. = 353, Q.D. = 122 s d = 174 

Sk. = 3(382 — 353)/l 74. ' 



ANSWERS TO EXERCISES 


333 


40. = 1060, ff A =- 21-1; H = 1060, ctb = 221. 

These data alone give a false impression. Note that B's distribution is 
U-shaped. 

Exercise 6 

1. r = - 0-78, y = 18$ - 1$*, a- = 10-4645 - 0-4545>>. 2. 0-8. 

8. r = — 0-78. 4. r = — 0-76. 6. y = 3^V* +„9f •- 

6. r = 0-905, >• = 0-267* + 0-495, * = 3-852y - 1-812 7. 0-84. 

8. 0-82; Y - 42 = 1-1682(X - 56), X - 56 = 0-5695(Y - 42). 

9. Max. 0-832, min. 0-799. 10. y = 10-3 - 2-21*. 11. 0-65. 

12. y - 0-465* + 5-9, * = 0-914y + 3-0 13. y = 0-8245* + 19-6. 

14. r = 0-97; S„^=4-9. 15. 0-69, 0-73 16. y = 0-732* - 0-33. 

17. Y = 0-5722X - 0-10 18. - 0-74; Y = 270-5 - 1-573X. 

*19. y = 1-229* + 4-04, * = 0-7605y - 2-91, r = 0-97. 20. 0-594 

21. Y = 0-4165X - 21-14, X -- 2 25Y + 58-75. 

22. Y = 297X - 256 23. Y = 49 - 0-2405X. 

24. (i) r = 0-83, 5, ^ 0-8096, (m) r = 0-806, b t =- 0-797, = 0-816. 

26. r - 0-27. 

26. (i) r =- 0-537, Y - 44-38 =- 0-70(X - 36-95), X - 36-95 = 0-413(Y - 44-38); 
(u) S„ = Viei-SCSfl -0 : 537*), S*=- V / 95 : 326(l - 0-537*). 

27. 0-766 

28. y - 23-2 = 0-8575(* - 24-5), * - 24-5 = 0-6841 (y - 23-2); 

S„ = Vin^Hl - 0-766*), S* = V73Tl"—~0-766*). 

29. cr^-ir) 8 = 39-31 : 1-6724 in C I. units (squared) 

30. r = 0-45, Y = 0-70X + 8-11, X = 0-29Y + 42-93; 

S„ = Vl67-47(1 - 0-45*), S* = V / 69-42(T- 045 i ). 

31. a d 2 = 1*398, <r, 2 * ~ 3-340 in C.I. units {squared) 

32. 7) V = 0*47, r) x = 0*49 33. 0*77. 34. 0 065. 

35. r ■= 0*804, <r (x _ y) 2 = M184, C.I. units (squared) 36. 0*46 

38. (i) 0*622; (n) B « 0-51A + 12*6; A - 0-76B + 10*4; (ui) a {A _ B) 2 = 2*1226 

C I. units (squared); (iv) rj y = 0*67, = 0*63. 

39. Y = 0-7784X - 0*765, X = 0-5262Y + 6*277 

40. rj v = 0*78, Vx « 0*83. 41. y 9 = 0*61 

42. r = 0-095, = 0-723, r u 0-325 

43. y = 10-125** - 13-88* + 9-9. 44. y =- 1-332** - 20* + 192. 

48. y = 1572 + 54-5* + 8-71* 8 , r = 1441 + 141-5*. 

46. y = 109-6 + 2-716* - 0 183**. 47. y = 1024 + 40-5* + 27-5**. 

48. y = 109 + 0-883* - 0-2457**. 49. Log A === 12-1570, a - 2-3064. 

50. y = 0-51* 1- ** =e= i**. 51. y — 114(1-07)* 52. y = 0-25*> + 4. 

Exercise 7 

Note : (1) Sheppard’s corrections to the moments have not been applied in 
this chapter, or later, unless the question expressly requires their application. 
(2) Moments are stated, in general, as powers of class-interval units. 

1. H « 2*7465, ft 8 *= 0, H * 26*41, = 0, fa « 3*5. 

2. v t = 3, v a = 10*5, = 40*5, v 4 = 168; /x 4 » 0, ft 8 = 1*5, /* 3 — 0* p 4 = 6, 

3. 346, 2*55. 4. 40*87, 1*463, - 0*243, 8*56. 

5. n t « 5*854, ft 8 =^0,® 114*5; )3 a = 3*34. 



334 INTRODUCTION TO STATISTICAL CALCULATIONS 


6 . 1 - 78 , - 0 - 1 , 8 - 68 . 

7. /t, = 1-95, n 3 sa= 0, H - 10-23; ft «= 0, ft = 2-7. 

8. Area *= 1; /t, = if., fi 3 = 0; M D. = 1-5, Q.D. = 1-39. 
». 4, if-, ip. 11. v, = 19-004, v, = 92-238. 

12. 4-229, 0-376, 63-2; ft = 2-97. 

18. (a) 53$; (6)2-16,5-44; (c) 0-7744, 0-88. 

14. (a) JL», J^L 1 ; (6) $L 2 , 0, |L‘; (c) L/2, LV$. 

15. (1) e 2 /m, (2) e«/6m 2 . 


Exercise 8 


1. 0-117, 0-053, 0-338. 2. (0-117)*, (0-053)*, 0-065, 0-303. 

3. 0-844, 0-191, 0-886, 0-845, 0-005, 0-995. 

4. 0-009, 0-172, 0-991, 10 C 8 (0-72)’(0-28) 3 =- 0 264, 0-702. 

5. Jy. 6. 9'2 7. 0-047. 

8. 0-7241, 0-023. 9. 10. 1458/10=. 

11. 0-224. 12. t -l, 13. 0-037 

14. Afr l-f’-f 15. + (1 - p)y. 16. x = 3. 

17. 4; if » — 6, odds are 61 : I, if n 7, odds are 124 : 1. 

18. 0-245. 19. 25. 

20. ($* + $)($< +$)($f+ $)-- 2 V(/ 3 + 6/ 3 + 1U +6). 21.7. 22.$. 

K41 -HO 

83. P(5) white = ^ 5-0 317. 

24. Mean = 4-95, a 2 = 1-29. 25. 7-5, V3-75. 26. *$$*. 

27. Sums: 3 4 5 6 7 8 9 10 11 12 

Fr. : 1 2 1 1 3 3 1 1 2 1 2/ = 4 3 

P(r) = coef of t' in Jy (t + t 2 + t l + t*)(t 2 -]■ t 3 + t 3 + P) 

28. P(r) is coef of t r m expansion of Jy (t 2 t 3 + f*) 3 . 29. -fa. 

30. y^; g f = gj(l - f 8 ) 3 (l - t )- 3 , P(r) is coef. of t r . 


31- 1. ft & IS 32. 33. X, 2/0$; Y, 1/8. 

34. Exp. = p t ( 1 - p t ) xj- + p t ( 1 - pi) yl- + pip t (x + y )/-. 

= (Pi* + P*y) /-• 

38- T^r, -i%. t 4 ^- 36. Exp = iff of p/15/1 =-- 12/-. 37. 0-08. 

4/2 

88. (170a + 2046)/163 =- /f, .*. ratio is 6 : 1. 

u 

39 . ^(< 2 + < 8 + n *. i 


40. (a) 6 to 1, (6) 6 to 1, i. 

41. W TS, 37 : (li)-nr- *(* - 

42. '•C I >C r _,/‘ lf+rt C r 

44. Sum: 2 3 4 5 

Fr.: 4 4 9 8 


l)/(* + I'M* +3' — 1). 

43- 4iir> TTSTTi ’ tAt- 
6 7 8 

6 4 1 s/ = 6 2 


Mean sum = 


4$; gf. ■= ^ 2 (2< + t* + 2t* + <«)*. 


45. ,-r$x (16 + 24 + 26 + 36 + 22 + 12 + 9) ; mean = 4$. 

46. (i) $, (ii) f, (iii) f. 47. (l) yL, (u) 48. 5/5$. 

49. 4/1. 50. 12/-. 51. ^V, 

52. B's chance is twice A’s, i.e., 140/6 4 : 420/6 5 . 

53. (i) t&, (ii) tVo- 54. 104. 


55. 7/8. 



ANSWERS TO EXERCISES 


335 


««• (* - r+y) shalin g s - 

57. (i) ‘C= 0-1536, p + q = 1, .\ p = 0-2, or 0-8 

(ii) = f, pt — f» £i» = sV- 

58. «, «. 

«r X 31-30 

59. Write = x. 

Then Y = ^{1 + *(0-95)* -f * a (0-95) 4 + . . } 

x = A lifO' 95 +'*(0-95) 3 + * 3 (0-95) s + . . 
Y:X=&:^(0-95) 

= 3000: 2945. 

" n im<h = i(?i + q t ) ~ 1 >"* 1 


l — Up i -f p 2 ). 
Exercise 9 


1. (i) 1, 12, 60, 220, 495, 792, 924, 792, . ; (ii) 2, 15, 66, 210, 484, 799, 943, 

799 

2. 0 0730, 0 0193, 0 1938, 79, 4083, 924. 

3. 20% of area lies to left of 1st quintile 

30% lies between 1st quintile and mean. — 0*3000 

and interpolation for ~ between 0-84 and 0*85 gives 0*842 0*842cr. 

4. Binomial 0*056, Poisson 0*055. 


5. 0-670, 0-268, 0-054, 0-007, 0-001. 

Poisson; m = 

0*4 

0-665, 0-277, 0-052, 0-006. Binomial 


6. 0*21. 

7. 0*048. 


8 . Mean = 0*89. 

Frequency 

Fr. (nearest 

Proportion. 

(p X 300). 

whole no.). 

0*4107 

123*21 

123 

0*3656 

109*68 

110 

0*1627 

48*81 

49 

0*0483 

14*49 

14 

0*0107 

3*21 

3 

0*0019 

0*57 

1 

0*0001 

0*03 

— 


1*0000 300*00 300 

9. 2P(#) = 1, 2* T(x) = np, 2# 2 P(*) ~np{(q +p) n “ 1 -f (w — l)^(y + P)*"*) 

= + (n — 1)£} =r fi*p* + npq a= v # . 

10. 2P(*) 1,S*P(*) =m 

, 2* 8 P(;r) = we” w {l -f 2m -f 3ra 8 /2 ! + ...} 

= we" m (d! w + we m ) — m {m 2 = v a . 

11. 0*065. 12. 0*992. 13. 2(0*086) = 0*172. 

14. £ * 10, a 2 = 8; n = 50, p «= 0*2, 

15. (i) 39 ' 5 4 — - 2 = 1-875, ~~ = 0-30, i e , about 30; (ii) about 9. 

16. 22-3, 33-5, 25-1, 12-6, 4-7, 1-4, 0-3, 0-1. 17. 61%, 9%, 

18. ftr) = —)|=e •• g - " , P(5) = 0-067, ^(5) = 0-065. 



336 INTRODUCTION TO STATISTICAL CALCULATIONS 


19* (i) P of S or more = 0T938 
„ 9 „ « 0 0730 

„ 10 „ « 0 0193 

„ II „ = 0 0031 

99, 0*102, 0 004. 21. 34. 


(ii) P(5) = 0-039 
P(6) = 0-001 
(i) At least 10 correct, 
(ii) All six. 

22. 0-048. 


00. (i) 0-204; (ii) 0-094; (lii) 0-389; 4-59. 

24. Note,— I.Q.'s are whole numbers : “ over 110 ” indicates - « -— , 

etc. 2-8, 10-0, 25-6, 0-6, 100 ± 7, 95. 

85. 99-8, 100 ± 37, 100 ± 41 26. 53-17, 54-82, 62-96, 66-02, 70*56. 

27. 83rd. 28. x = 46-1; <x - 11-68, 21-6, 25-3. 29. % = 50-3, a = 10-36 

00. x = 79-945; a = 1-089 (Cl. units). Theoretical frequencies: 3-1, 30-8, 
148-0, 322-1, 319-5, 144-1, 29-6, 2-8, £/= 1000 
31. Si = 85-014; a 2 = 2-2498; a = 1-50, fr. 2, 9, 34, 81, 124, 124, 81, etc. 
Si taken as 85-0 


d 2 v 

32. Equate ~~ to zero . it follows that r 2 


0, i e , x = d: 1* 


Evaluate 


1 /*o-5 

-r= I e~hx*dx by expanding e~i* a and integrating the first 
/2ttX 


few terms. Proportion ~ 0 T 915 

84. X = 0-61005, v (a) = 0-3725, v (3) *= 0-2286, v (4) - 0-1416; ^ « 0*6104 ^ 

/i 3 » 0-6120 x 

85. fc 2 = 1*3491 — -jL = 1-2658, ^ ~ 6-1587 (corrected); ft = 3*85, excess 

0-85. For test of significance of this excess, see p. 271. 

86. About 10 oz. 

37. Mean — 1-1816 (say 1-182), o = 1-167 Expected number = 397. 

Exercise 10 

Note : N.S — not significant 

H S. — highly significant. 

VHS - very highly significant. 

1. 0-029, 0-022. 2. 26 3. 0-65 dr 3(0-048). 4. 0-026, 0-0014. 

„ diff. 6 0-007 , , „ 


p taken as 0-012. 
12. 0-23, 0-11. 


6> s.e. 4-2’ ° r 6-005 " 14 N S 
VHS - 7 - 018 

9. Definitely; —— 6. 

J s.e 

10. 95% limits 19 and 5; say between /38,000 and ^10,000 p taken as 0-012. 
U - = 1-67 ' N S - 8-1 to 11-9%, 10-2 to 14-8%. 12.0-23,0-11. 

18. (a) 1630, 1439; (b) 524, 515; (c) 41-2, 25-2 

lqi 

s e.diii. = 48-3, j— ^ 4, V.H.S. 

14. 33-93 ± 1-96(0-34); 27-44 ± 1-96(0-31). 

15. % — 45-53; t ±2 = 1-137, — 0, = 3-52 (in powers of C I. units); 

Pi — 0, Pi — 2-72 : distribution does not differ significantly from 

normal. ^^5? = 8 . V.H.S. 


s.e. 



ANSWERS TO EXERCISES 


837 


1®. ~ = 2-67, H.S. 16-8 -± 2-33(0-3). Probability that the mean of a 

U’a 

random sample of 300 drawn from such a population lies in this range is 
98%. 

17. n = 1537, range = X ± 1-96(0-2552). 

18. =a> 4. Very unlikely. 19. ~~ 2-58, H.S. 

20. 0-018, 0-036, 0-005. 

21. (i) -j-jj = 1*70. N.S. Probability of as great, or greater absolute 

difference = 0-09. 

1*9 

(ii) Qrggg — 1*91* Probability of as great, or greater absolute difference = 
0*056. 

22. 28-57 ± 2-33 4=JL; n 

V 500 

23. (i) ~ 1'26, N.S.; (ii) Quite insignificant. 


3394 (minimum), range 28*57 i (0*0499). 
0*05 


24 - w<£m = 1 ' 56 - NS - 


25. Means : 


= 4. 


, , 0*08 
^ 0*077 ~ 

V.H S. S.d.’s : 


1, N S. 

U 

8*8 


1-25. N.S. 


52 

12*4 : 

26. 0*407 - 0*293; 0*379 - 0*321. 

27. Yes. 0*2 ± 2*58 (0*038). 28. Minima 129, 2064. 

29. (i) x 3 *= 2*50, -JP = 0*057. Yates’ Correction applied m each case. 

(ii) x 2 = 9-64, iP = 0*001. 

(in) (a) = 2*077, JP = 0*075; (6) = 9*2564, *P « 0*0012. 

(iv) x l = 3*90, ^P = 0*024. 

30. v = 5, x 2 = 9*0, P =o= 10% (i.e., no evidence that die is biased). 

31. v = 9, x 2 =^= 15*6. N.S. 

32. v = 8 (with grouping), x 2 — 10*25, P > 20%. No reason to doubt 

hypothesis. 

33. v =?= 8 (11 — 3, with grouping), x 8 =- 3*84. N.S. 

34. v - 6, x 2 - 0*9, P > 30%. 

35. m * 2*75. 

No. of deaths per 

month : 01234567 and over 

Observed f.: 32655 1 2 0 

Theoretical f.: 1*53 4*22 5*80 5*32 3*66 2*01 0*92 0-54 

30. v = 9, x % = 7*4, P > 50% ; x 2 - 19*2, P =£= 0*02. 

87. P(*) = 


# ! 

,r; 0 

/: 17,150 


1 

1896 


2 

105 


S/> 


39, x a — 4*695 (uncorrected), v ■ 

= 3*23 (corrected) 

40. = 4-23. V.H.S. 


1,~ = 2-167, P =2:0-03. 

O’ 


19,155 

S. 


a 


1*80, =2:0*07. N.S. 

41. No. x 3 —119* 



338 INTRODUCTION TO STATISTICAL CALCULATIONS 


48. y* = 5-49, - = 2-34, P < 0-02, (i.e., there is probably association). 

0 

48. vy = 3-143. P < 0-002. 44. (a) 0-55, (6) 1-36. 

45. y* = 4-62, - = 2-15, P =£= 0 032 (without Yates’ Correction), 
cr 

X 8 as 4*01, - as 2-0, P =2= 0*046 (with Yates* Correction). 

<7 

48. There can be no doubt of the association. 

47. s* = 3-25, t = 2-08, v = 24 : just significant at 0-05 level; = 2-06. 

48. v «= 15, t = 2-50, # (0 . 02) = 2-60; not likely. 12-102 ~ 11-998. 

49. v = 9, t = 1-47. NS. 

o.iao 

50. v = 25, t = 4-03. V.H.S. 42-5 ± 2-79 2 -=-. (i.e., 44-23 ~ 40-77). 

v 

51. v = 9, f = 2-30. S 52. v - 11, t - 1-70. N.S. 

53. v = 9, f = 1-81. NS. 54. „ = 8, t = 2-02. N.S. 

55. v «= 9, t = 4-74. V Il.S. 56. v = 7, f = 1-43 N.S. 

57. v = 9, t — 2-87 S at 0-02 level. 58. v - 18, t ^ 1. Definitely N.S. 

69. v = 10, t - 2-39. S 

80. (i) v = 7, t = 7-33, V.H.S., t> — 6, not at all significant; (n) v = 13, 

t =- 2-56. .'. S. 

81. v = 18,2 = 2-01. N S. 62. v =~ 10, t = 1-25. N.S. 

63. v =- 39, t — 1-70. NS. 95% limits. 1-5 ± 1-78. Note zero is in this range 

64. v = 12, t =- 1-74. N.S. 

85. v — 16, t — 0-42 ; test does not at all discriminate 

66. v = 16, i = 2-43, S.; f l0 06) = 2-12, f (0 . 0S) = 2-58; gives reason to suspect 

hypothesis that variables are uncorrelated. 

67. 0-32, 0-413. 68. v = 20, t - 2-07, not quite S at 0-05 level. 

69. 37. 

70. Li 1 ~ t il — =_ 0-37. Not at all significant, ( ——is a normal 

s e. 0-227 e V s e. 

deviate.^ 

71. i — -■= ^ 1-84. NS (1-84 <' 1-96 .'.NS at 0-05 level). 

s.e 

p may be 0*55 as far as the evidence goes. 

|« _ z \ 0*355 

72 . Cl-= -fr-zn ; no grounds for doubting hypothesis 

s.e. 

78. 0-333 < < 0-922. 


Source. 

d.f. 

1 

Sum of 
squares. 

Mean 

square. 

F. 

Between schools 

3 

29 

9-67 

4-59 

Within schools . 

! 28 

59 

2-107 


Total . j 

31 

88 

— 

1 

1 


r i = 3, v a = 28, F( 0 . 0 i) = 4-6. The difference between school means is 
therefore highly significant. 



ANSWERS TO EXERCISES 


339 


75. S.e. of the difference between the means of two samples of 8 is . ] 2*11 x 




11 , 2*11 
8 + 8 = 
28, tf( 0 0& ) = 2*05. 
*<o*o2) = 2*47. 
^(o oi) = 2*76. 
Differ- 

Schools. ence. 


; 0-726, with 28 d.f. 


2-05 x 0-726 
2-47 X 0-726 
2-76 X 0-726 


1-49, 

1- 79. 

2 - 00 . 

Schools. 


is \J 2 




Least sig. diff. at 0*05 level 

>» ,• „ 0-02 „ 

„ „ 0-01 „ 

Differ¬ 
ence. 


A~B 

A-C 

A~D 


1*0 

N.S. 

B-C 

0-75 

N.S. 

1-75 

S. about 0-02 level 

B-D 

1-75 

S. about 0*02 level 

0*75 

N.S. 

C-D 

250 

S. beyond 0*01 level 


Between samples 

3 

20,401*25 

6800-4 

8-97 

Within samples . 

12 

9,098-5 

758-2 


Total 

15 

29,499-75 

— 

- 


v i — 3, v% — 12, h(ooi) — 0*0. 

highly significant. S.e. of 
12 d f. 


Difference between treatment means is 


difference = 


J- 


758-2 x 


19-47, with 



Differ¬ 

Treatment. 

ence. 

I-II 

12-25 

- I-III 

89-5 

I-IV 

56-75 

77. 



NS. 

Beyond 0-001 level 
At 0 02 level 


Differ- 
Treatment ence. 
II-III 77-25 

II- IV 44-50 

III- IV 32-75 


Between 0-01 and 
0-001 levels 
S at 0*05 level 
NS. 


Between columns j 

20,401-25 

3 

6800 

17-3 

Sig beyond 0-001 level 

,, rows 

5,558-75 1 

3 

1853 

4-7 

Sig. „ 0-05 „ 

Error 

3,539-75 

9 

393 


Total 

29,499-75 

15 

_ j 




S e. of difference between means of two columns, or two rows, is 

i.e , 14-0, with 9 d f. 

Recalculate least significant differences with the new s.e. and rewrite 
table of answers to Question 76. ► 

For the rows, differences for I—III, I-IV, III-IV, are not significant. 
I—II is significant between 0-02 and 0-01 levels, II-III between 0*05 
and 0*02 levels, II-IV beyond 0-01 level. 

78. 


Columns . 

3 

5-5 

1*83 

4-89 

Sig. between 0-05 and 0*01 levels 

Rows 

3 

5-625 

1-875 

5*00 

„ „ 0-05 „ 0-01 

Error 

9 

3-375 

0-375 



Total . 

15 

14*500 

- ! 1 


4 


393 X : 



340 INTRODUCTION TO STATISTICAL CALCULATIONS 


70. S.e. of difference between column, or row, means = */ 0*375 X j = 0*433. 


Significant Significant 


Towns. 

Difference. 

at: 

Towns. 

Difference. 

at: 

A-B 

1-25 

0*02 

B-C 

1*50 

0*01 

A-C 

0-25 

N.S. 

B-D 

1*25 

0*02 

A-D 

— 

Significant 

C-D 

0-26 

N.S. 

Significant 

Quarters. 

Difference. 

at: 

Quarters. 

Difference. 

at: 

I-II 

0*375 

N.S. 

IT—III 

0*750 

N.S. 

I-III 

1*125 

0*05-0*02 

II-IV 

0*875 

N.S. 

I-IV 

0*5 

N S. 

I1I-IV 

1*625 

beyond 0*01 


Columns . 

3 

165-28 

51*76 

1*99 

NS. 

Rows 

3 

259*31 

86*44 

3-32 

N.S. 

Treatments 

3 

1372-12 

457*37 

17*55 

V.H.S. 

Error 

6 

156*37 

26*06 



Total . 

15 

1943*08 

— 




S.e. of difference between means of columns, or rows, or treatments = 
fj 26-06 x | = V13-03 = 3-61. 

For 6 d.i, t (0 . 05) = 2-45; least significant difference at 0-05 level — 2*45 x 
3*61 = 8*84. 


Treat¬ 

Differ¬ 

Treat¬ 

Differ¬ 

ment. 

ence. 

ment 

ence 

A-B 

14*0 VHS 

C-B 

9*4 S. at 0*05 level 

A-C 

23*4 

C-D 

1*6 NS 

A-D 

2l*8 

D-B 

7*8 N S. at 0*05 level, but differ- 


ence may be great enough 
to raise a doubt. 


Chambers . 

3 

230-6875 

76*89 

_ 

. 

Times 

3 

1993*6875 

664*56 

2*24 

N.S. at 0*05 level 

Counters . 

3 

923*1875 

307*73 

1*04 

N.S. „ „ „ 

Error 

6 

1777-8750 

296*31 

— 

— 

Total . 

15 

4925*4375 

— 




S.e. of difference between means — ^296*31 X ~ — 12*18, 

Least significant difference at 0*05 level = 12*18 x 2*45 at 30. 

None of the differences between means of chambers, or times, or counters, 
is significant at the 0*05 level. 



ANSWERS TO EXERCISES 


341 


Columns 

3 

66-5 

22-17 ! 

, 

Rows 

3 

2,444-5 

814-83 

, 20-8 

Treatments . 

3 

13,548-5 

4516*17 ! 

115-7 

Error . 

6 

234-5 

39-1 S 

— 

Total 

15 

16,2940 




For Rows F = 20-8 approaches significance at the 0*001 level 
For Treatments F = 115 5 is very highly significant indeed. 

Mean yields. A. B. C. t>. 

Lbs. per -fa acre . 251-75 314-5 327-25 310-5 

Tons per acre. . 5-619 7-020 7*304 6*931 

/39-T 1 

S.e of mean of sample of 4 plots = */ -L- — 3-1265 lbs. per — acre plot. 

— 0-07 tons per acre 


Squares 

1 

18 i 

18 

_ 


Rows 

4 1 

11,489 

2872 

21-1 

= 4, v t = 6, F (0 . 001) = 21-9 

Columns . 

4 1 

5,763 

1441 

10-6 

F(o oi) ~ 

Treatments 

2 

2,277 

1138 

8-4 

v i = 2, = 6, F (0 . 05) = 5-1 

Residual . 

6 

815 

136 

— 

Total 

17 

20,362 

— 

| — 



N B.—For economy m calculation tabular yields were put in the form 

™This makes no difference to the values of F shown above 

but if it is desired to apply them to the actual yields variances must be 
multiplied by 100 , and standard errors by 10 . 

84. f — 0*537, T] y ~ 0-554, 7 7 , = 0-554, Regression is linear. 

85. r = — 0-58, - 0*90, 77 , = 0-63. Regression is not linear. 

86 . Averages chart: X — 0-0505; inner limits, 0-0499 — 0-0511, 

outer limits, 0-0496 ~ 0-0514. 

Ranges chart: w = 0 0015; upper limits : outer 0-0035, inner 0-0027. 

Additional Exercises Ch, V—X 

1. ( 1 ) a* S/(* - x)\ where N = 2/. 

= - 2*2/* + N* ! } 

= - 2*« + ** = S> - * a . 

(ii) a 1 - 4-536. 

8. * = 10-85, <r = 3-162. 


3. (i) 3-40, 3-40; (ii) 0-773, 0-60. 



842 INTRODUCTION TO STATISTICAL CALCULATIONS 


(i) n, =s 3-733, ju> = — 0184, mi = 39-61, (powers of C T. units); 

(ii) 0-025, 2-84. 

(i) 15-41; (xi)-l-716, 2-133. 

VJl = 0-05, fa = 2-59; y 2 = 0-41, = 1-67. N.S. 

x\ 10- 11- 12- 13- 14- 15- 16- 17- 18- 19- 20- 

Theoretical 

frequencies: 7*7 14*2 29*7 50 0 67-8 74*1 65*2 46*3 26*4 12*2 6*3 
£/ = 399*9 

v==ll — 3 = 8. Taking frequencies as nearest whole number ~ 4 : 
not at all significant. 

15-41 ± 1-96 ; 2-133 ± 1-96 ?^E. 

V400 V 800 


15-41 ± 3-09 

V 25 


10. 15-41 ± ]-96 

V 400 


Relative error 


/a q I N — n\ I a /N — n 


/n - I 

: \'N - w 


i-’LrJ 

2N ' 


i.e., percentage error: 
12. 35-48, 4*65. 


_ 100 [n ~ 1) _ 50(w - 1) 


13. *: 

Theoretical 

20 - 

22 - 

24- 

26- 

28 

30- 

32- 

34- 

frequencies: 

1*9 

4-9 

13*8 

33-0 

65-3 

107*7 

148-1 

169*5 

^: 

Theoretical 

36- 

38- 

40- 

42- 

44- 

46- 

48- 

50- 

frequencies: 

161-7 

128*6 

85 1 

47-0 

999*9 

21*6 

8-2 

2*0 

0*9 


14. (i) x 2 is H.S.; (ii) fi 2 = 5-3974, /i 3 - 1*4573, /x 4 = 107 07, (powers of 
C.I. units); 

Vfa = 0-1162, fa = 3-68; £ = > 4 V.H S. 


-- - 0-155 . 

15? Sample A: x — 27-28, a - 6-75. Sample B : x =. 32-82, a = 7*85. 

Diff. between means 5*54 ,, , , 

18. Yes.-— >14 Very highly so indeed. 

S.e. D'oo 


^ A r X/JUl* L/V l. V Y VvV-il AAiVUiJLO 

18. Yes.- 

s.e. 

17. * = 30*45, <j = 7 89, s.e i 


; 30-45 ± 1-96 Jll 9 - . 
V1520 V1520 


5, 27-5, 162-5, 1017-5, (about zero); 2-5, 0, 17-5, (about mean). 

3-481, 1-476, 1-842. 

1st Group : 27-27, 4-11, 2-71, 2-90. 

2nd Group : 28-40, 3-86 [Assume range 15-50]. 

Say x = 46, a = 7-5. 22. 2-28%, 4-15%, 49-5%. 

7-2, 42-7, 53-2, 14-8. _ 24. (i) 36 sq. units; (ii) 3, 4*: (&) f 0. 

Approx. 1*041, f, 3Vj. 26. /x a = 1, = 2, /x 4 =. 9. 

p. £• £• 28 - M O'203a, §, a V?; (ii) |*, £. 




ANSWERS TO EXERCISES 


34 ! 


29. About 1%, or less. 

30. - 1) = ~ $Zfx = v, - *, etc. 


31. P(5) = tffo, P(3) = -&m> = HUi- 

32. (i) 1, 3, 6, 10, 15, 18, 19, 18, etc. £/ = 5® » 125. 

Frequency of a sum of r is the coef of # r ~ 3 in the expansion o 
(1 — ^ B ) s (l — 

(ii) Mean — 3 n, s.d. — V2n. 


33. S.e. of mean of samples so drawn 


16-2 


0-573. 


Dig. 

s.e. 


34. (i) 


V 800 

Very unlikely indeed. The 99-8% F.L. = 146-2 ~ 149-8. 
Diff. (means) __ 5 


0-573 


> 5 


0-80 


> 6. V.H.S. 


, Diff. (s.d.'s) _ 0-5 
W s e. “ 0-507- 


Not a 


all significant. 

1 K.Q 

*»-(-) 145±2 ‘ 33 Vl6§- i e> 

(150 ± 1-343). 


16-4 

(145 ± 1-302); (fc) 150 ± 2-33 ~~=, i.e. 


36. n not less than 21,685, say 21,690. 37. — = .^1 = 1 . 43 . N.S. 

7 s e. 0*014 

no , v Diff. 9 , , „ c , . .. Diff. 23 _, o xr c 

38. ( 1 )-— Tn~ 7 \i not at all significant; ( 11 )-= - -- - - a 1*8, N.S. 

' s e. 17-9 ° w se. 12-64 

39. Probability of 4 or more is 0-242. 

40. (i) 123-1 4- 8-293# -y; (n) 30-0# - 246 - 0-2893# 2 = y. 

41. y == 1-25# 2 - 14-9# + 198. 

42. y =. 0-2093# 2 - 2-633# + 10. 43. Say, y = 8-4(l-2)* 

44. r = —- 0-953. 46. r =* - 0-6. 47. - 0-95. 


48. y ~ 49-3 = - 0-78843 (# - 35-8), say y = 77-5 - 0-7888#. 

# - 35-8 = - 1-15211 (y - 49-3). 

49. r = ~ 0-74. 50. r =* 0-77. 


51. y - 66-65 2-202 (# - 30-325), # - 30-325 = 0-271 (y - 66*65) 

53. 7 ] v as. 0-810, t) X — 0-776. No reason to suspect that regression is not lineal 

54. Y - 44-1 =. 0-78(X - 53-2), Y - 0-78X 4-2-6; X - 53-2 =- 0-593(Y * 

44-1), X = 0-6Y 4- 27. 

55. Probability of 4 or more heads = 0-6866. 56. (i) yf^, (ii) (iii.) ^ 

57. (i) 28, (n) 37, (in) 33. Total fr. = 7 3 == 343. Frequency of sum t 

r = coef. of t r in expansion of (1 — tf 7 ) 8 (l — /)“ 8 . 

58. P(r) = *** X coef. of f r ~ 4 in expansion of (1 — tf 4 )(l — f) -1 . 

59. P(r) = rf, X coef. of t r in expansion pf (1 — / 4 )( 1 — t)~ l . 

60. I (10) + i(5) + (2-5) + . . . 


W +1 + (*)* + (1)’ +•••)=- v(rrrj) = 4 /" 

61. Thrown twice: no. of different products = 18: P(6) = P(12) =* 

Thrown three times : P(12) = -fa = P{24); P(36) = ■Jfc. 

62. (i) P(U) - jh. (») P(17) or P(18) = & + -fo = *. 

63. 29-9 shillings. 

84. *»•/'*+»+*>C„ i.e., 6 xyxl(x +y + *){* + y + x - 1)(* + y + s - 2). 
86. £1/18/0. 66. £1/11/0. 



344 INTRODUCTION TO STATISTICAL CALCULATIONS 


# 7 ,-; acss 2*5. difference is significant. 

s.e. 

#g. * 2-341 with Yates’ Correction, v = 1. N.S. 

89. x = 11; i.e , actual number of coloured is 73, or 50. 

70. Sample I: x 2 == 2-964Sh With Yates’ Correction in each case : neither is 
Sample II: x 2 — 3-2111 ) significant at the 0*05 level. 

Combined samples : x 2 = 0*537. V6-537 = ~ = 2-55 which suggests 
that the hypothesis may well be wrong 

71. x* = 5-949uncorrected, * = 2-44, P = 2 = 0-015; =• 4-968 corrected, - 2-23, 

P = 0-026. S 

72. For v sa 2, x 2 ” 9-44 is H S , i.e , the hypothesis is not supported. 

Diff. 2-8 . ...... 

78 -Tir == 2^7- •• q uitehkel y- 

74. v = 9 ,t — 0-62; no doubt thrown on assumption. 75. 97-2 ± 10-2. 

78. t — 2-64 for difference between means of the samples with v = 18, 

f( 0 .oa) = 2-55. It is therefore unlikely that the two samples can be 
drawn at random from the same normal population. 

77. v = 18, t = 1-43. N S. 78. v ^ 25, / =* 1-58. N.S. 

79. (i) 0-381, (ii) 0-445. 

80. = 1*63, N.S , i.e., no evidence against hypothesis 

81. v x — 7, v % = 76, F — 2 757, since F (0 05) 2-13 the variables are probably 

correlated. 

82. v x s= 6, v 2 =■= 76, F = 1-97. assumption of linearity is not discredited. 


83. 

Sums of 


Mean 



Source 

squares 

d.f. 

square. 

F. 


Columns 

69-14 

5 

13-83 

1-62 

N.S. 

Rows 

43-47 

5 

8-69 

1-02 

N.S. 

Treatments . 

. 521*47 

5 

104-29 

12-21 

V.H.S 

Error . 

. 170-89 

20 

8*54 

— 


Total . 

804*97 

35 

_ 

_ 



84. S e. of the total yield of a treatment = V8*54 x 6 = 7-16 bushels per 
acre. 

S e. of difference between means of two treatments = V8-54 x \ ~ 1*688. 

Differences between means of 3-53, 4*27, 4-81 and 6-49 are significant at 
the 0*05, 0*02, 0*01 and 0-001 levels respectively. 

Suppose that the six treatments in the above example are every possible 
combination of a nitrate at two different levels and a phosphate at three 
different levels, symbolized by N 0 P 0f N^, N 0 P a , N^, NjPj, N^— 
N 0 P 0 corresponding to no nitrate and no phosphate. 

Then it is possible to carry the analysis much farther, since sums of 
squares, etc., due to N x , P lf P 8 , and to the interactions NjPj and N^j 
can be determined. The reader who is interested must consult books 
dealing specifically with the application of statistical methods to Biology 
or Agriculture. 



TABLES 


845 


TABLE I 

Ordinates of the Normal Curve 



% 

a 

0 * 00 . 1 

0 * 01 . 

0 * 02 . 

! 

0 * 03 . 

0 04 . 

0 * 05 . 

0 * 06 . 

0 * 07 . 

0 - 08 . 

0 - 09 . 

0*0 

0*3989 

0*3989 

0*3989 

0*3988 

0*3980 

0*3984 

0*3982 

0*3980 

0*3977 

0*3973 

01 

0*3970 

0*3965 

0*3961 

0*3956 

0*3951 

0*3945 

0*3939 

0*3932 

0*3925 

0-3918 

0-2 

0*8910 

0*3902 

0*3894 

0*3885 

0*3876 

0*3867 

0*3857 

0*3847 

0*3836 

0-3825 

0-3 

0*3814 

0*3802 

0*3790 

0 3778 

0*3765 

0*3752 

0*3739 

0*3725 

0*3712 

0*3697 

0-4 

0*3683 

0-8668 

0*3653 

0*3637 

0*3621 

0*3605 

0*3589 

0*3572 

0*3555 

0*3638 

0*5 

0*3621 

0*3503 

0*3485 

0*3467 

0*3448 

0*3429 

0 3410 

0*3391 

0*3372 

0*3352 

0*9 

0*3332 

0*3312 

0 3292 

0 3271 

0*3251 

0*3230 

0*3209 

0*3187 i 

0*3166 

0*3144 

0*7 

0*3123 

0 3101 

0*3079 

0*3056 

0*3034 

0*3011 

0*2989 

0*2960 i 

0*2943 

0*2920 

0*8 

0*2897 

0*2874 

0*2850 

0*2827 

0*2803 

0 2780 

0*2756 

0*2782 1 

0-2709 

0*2686 

0*9 

0*2661 

0*2637 

0 2613 

0*2589 

0*2565 

0 2541 

0 2516 

0*2492 

1 

0*2468 

0-2444 

1*0 

0*2420 

0*2396 

0*2371 

0*2347 

0*2323 

0*2299 

0 2275 

0*2251 

0*2227 

0-2203 

M 

0*2179 

0*2155 

0 2131 

0*2107 

0-2083 

0*2059 

0*2036 

0 2012 

0*1989 

0*1965 

1*2 i 

0*1942 

0*1919 

0 1895 

0*1872 

0*1849 

0*1820 | 

0*1804 

0*1781 

0*1758 

0*1736 

1*3 i 

0*1714 

0*1691 

0*1669 

0*1647 

0*1626 

0*1604 ! 

0*1582 

0*1561 

0*1639 

0*1518 

1*4 ] 

0*1497 | 

0 1476 

0*1456 

0 1436 

0 1415 

0*1394 | 

0 1374 

0*1354 

0*1334 

0*3315 

1 5 

0*1296 

0*1276 

0*1257 : 

0*1238 

0 1219 

0*1200 

0 1182 

0*1163 

0 1146 

0*1127 

1*6 

0*1109 I 

0*1092 

0-1074 

0*1057 

0-1040 

0*1023 

0*1006 

0*0989 

0-0973 

0-0957 

1*7 ! 

0*0940 

0*0925 1 

0*0909 

0*0893 

0 0878 | 

0*0863 

0*0848 

0*0833 

0-0818 

0-0804 

1*8 

0*0790 

0*0775 

0*0761 

0-0748 

0-0734 

0*0721 1 

0*0707 

0*0694 

0*0681 

0*0669 

1*9 , 

0*0656 

0-0644 

0 0632 

0*0620 

0-0608 

0*0596 

0*0684 

0*0573 

0*0562 

0*0551 

2*0 

0*0540 1 

0*0529 

0-0519 

0 0508 

0-0498 

0-0488 

0*0478 

0*0468 

0-0459 

0-0449 

2*1 

0*0440 

0*0431 

0*0422 

0-0413 

0-0404 

0*0395 

0*0387 

0*0379 

0-0371 

0*0363 

2 2 

0*0355 

0*0347 

0-0339 

0*0332 

0-0325 

0*0317 

0*0310 

0*0308 

0-0297 

0-0290 

2 3 

0*0283 

0*0277 

0*0270 

0*0264 

0-0258 

0*0252 

0 0246 

0*0241 

0-0235 

0-0229 

2*4 | 

0*0224 

0-0219 

0*0213 

0-0208 

0*0203 

0*0198 

0*0194 

0*0189 

0-0184 

0*0180 

2*5 I 

0*0175 

0*0171 

0*0167 

0*0163 

0*0158 

0*0154 

0*0151 

0-0147 

0-0143 

0-0189 

2*0 

0*0136 

0*0132 

0*0129 

0*0126 

0*0122 

0*0119 

0*0116 

0*0113 

0*0110 

0-0107 

2*7 

0*0104 

0*0101 

0-0099 

0*0096 

0*0093 

0*0091 

0-0088 

0-0086 

0*0084 

0-0081 

2*8 

0-0079 

0*0077 

0*0075 

0*0073 

0-0071 

0-0069 

0*0067 

0-0065 

0*0063 

0-0061 

2*9 

0*0060 

0-0058 

0-0056 

0*0055 

0*0053 

0*0051 

0*0050 

0-0048 

0*0047 

0-0046 

3*0 

0-0044 

0*0043 

0-0042 

0-0040 

0*0039 

0*0038 

0-0037 

0-0036 

0*0085 

Q -0034 

3*1 

0-0033 

0*0032 

0*0031 

0*0030 

0*0029 

0*0028 

1 

0 0027 

0-0026 

0*0025 

0*0025 


(Adapted, by permission of the Trustees of Biometrika, from Table II of Tables for Statisticians and 
Biometricians, Part I.) 

N 

For Frequency = N, and s.d. = a multiply tabular y by ~ for actual ordinate. 



346 


INTRODUCTION TO STATISTICAL CALCULATIONS 


TABLE II 


Proportion of area under the normal curve between the mean ordinate and the ordinate at t <• 
standardized deviate from the mean.* 


X 

a 

0 * 00 . 

0 01 . 

0 02 

0 - 03 . 

0 * 04 . 

0 * 05 . 

0 06 

0 - 07 . 

0 - 08 . 

0 - 09 . 

0*0 

0*0000 

0*0040 

0-0080 

0 - 0)20 

0-0160 

0 0199 

0 0239 

0-0279 

0 0319 

0-0359 

0*1 

0-0398 

0-0438 

0-0478 

0 0517 

0-0557 

0 0596 

0-0636 

0 0675 

0-0714 

0 0753 

0-3 

0-0793 

0-0832 

0-0871 

0 0910 

0 0948 

0 0987 

0-1026 

0-1064 

0 1103 

0-1141 

0-3 

0-1179 

0-1217 

0-1255 

0 1293 

0 1331 

0 1368 

0-1406 

0 1443 

0-1480 

0 - 161 V 

0*4 

0-1554 

0-1591 

0-1628 

0 1664 

0-1700 

0 1736 

0-1772 

0 1808 

0-1844 

0-1879 

0*5 

0-1916 

0*1950 

01985 

0-2019 

0 2054 

0 2088 

0-2123 

0-2157 

0*2190 

0 2221 

0*6 

0-2257 

0 2291 

0-2324 

0 2357 

0-2389 

0 2422 

0 2454 

0 2486 

0-2517 

0 254 * 

0*7 

0-2580 

0-2611 

0 2642 

0-2673 

0-2704 

0-2734 

0-2764 

0 2794 

0 2823 

0 * 28 f 

0*8 

0-2881 

0-2910 

0-2939 

0 2967 

0 2995 

0 3023 

0 3051 

0-3078 

0 3106 


0-9 

0-3159 

0 3186 

0 3212 

0 3238 

0-3264 

0*3289 

0 3315 

0-3340 

0-3365 

0 3389 

1*0 

0 - 3413 / 

0-3438 

0-3461 

0 3485 

0-3508 

0*3531 

0 3554 

0-3577 

0-3599 

0-3621 

1*1 

0-3643 

0 3665 

0 3686 

0 3708 

0 3729 

0 3749 

0 3770 

! 0 3790 

0-3810 

0-3830 

1*2 

0-3849 

0-3869 

0-3888 

0-3907 

0-3925 

0-3944 

0-3962 

1 0-3980 

0 3997 

o - 4 on 

1*3 

0-4032 

0-4049 

0-4066 

0-4082 

0 4099 

0-4115 

0-4131 

0-4147 

0-4162 

0-4 

1*4 

0-4192 

. 

0-4207 

0 4222 

0-4236 

0 4251 

0,4265 

0 4279 

0-4292 

0-4306 

0-4 

1*5 

0-4332 

0*4345 

0 4357 

0 4370 

0 4382 

0-4394 

0 4406 

0-4418 

0-4129 

0 444 

1*6 

0 4452 

0*4463 

0-4474 

0 4484 

0 4495 

0 4505 

0-4515 

0 4525 

0-4535 

0-4545 

1*7 

0-4554 

0*4564 

0 4573 

0 4582 

0-4591 

0-4599 

0-4608 

0-4616 

0 4625 

0 4633 

1*8 

0-4641 

0-4649 

0 4656 

0 4664 

0-4671 

0-4678 

0 4686 

0-4693 

| 0 4699 

0-4706 

1*9 

0*4713 

0-4719 

0-4726 

0 4732 

0 4738 

0 4744 

0 4750 

0 4756 

0 4761 

0*4767 

2*0 

0*4772 

0-4778 

0-4783 

0-4788 

0 4793 

0 1798 

0 4803 

0 4808 

0-4812 

0*4817 

2*1 

0*4821 

0*4826 

0-4830 

0-4834 

0 4838 

0-4842 

0-4846 

0-4850 

0 - 4^54 

0*4857 

2*2 

0-4861 

0*4864 

0-4868 

0-4871 

0-4875 

0*4878 

0*4881 

0-4884 

0*4887 

0-4890 

2*3 

0*4893 

0 4896 

0-4898 

0 4901 

0-4904 

0 4906 

0-4909 

0-4911 

0 4913 

0-4916 

2*4 

0*4918 

0-4920 

0-4922 

0 4925 

0-4927 

0-4929 

0 4931 

0-4932 

0*4934 

0-4936 

2-5 

0 4938 

0*4940 

0-4941 

0 494,3 

0-4945 

0-4946 

0-4948 

0-4949 

0*4951 

0-4952 

2*6 

0*4953 

0*4955 

0-4956 

0-4957 

0-4959 

0 4960 

0-4961 

0 4962 

0*4963 

0-4961 

2*7 

0*4965 

0*4966 

0 4967 

0-4968 

$•4969 

0-4970 

0*4971 

0 4972 

0 4973 

0 - 49 <* 

2*8 

0*4974 

0 4975 

0-4976 

0 4977 

0*4977 

0-4978 

0-4979 

0-4979 

0-4980 

0 4981 

2*9 

0*4981 

0*4982 

0 4982 

0 4983 

0 4984 

0-4984 

0-4986 

0-4985 

0*4986 

0-4986 

3-0 

0-4987 

0-4987 

0 4987 

0 4988 

0-4988 

0-4989 

0-4989 

0 4989 

0-4990 

0*4990 

8*1 

0-4990 

0 - 499 ) 

0-4991 

0-4991 

0-4992 

0 4992 

0 4992 

0 4992 

0-4993 

0*4993 


° f thC TfUSteeS ° f Biometrika > from Table H of Tables for Statisticians and 



TABLES 


347 


TABLE III 
Table of x* 


Probability. 



0*90 

0-50 

0 -10. 

0*05. 

0 -02. 

0 -01. 

0 -001. 

1 

0016 

0-46 

2-71 

3-84 

5-41 

6-64 

10-83 

2 

0-21 

1-39 

4-61 

5-99 

7-82 

9-21 

13-82 

3 

0-58 

2-37 

6-25 

7-82 

9-84 

11-34 

16-27 

4 

1*06 

3-36 

7-78 

9-49 

11-67 

13*28 

18-47 

5 

1-61 

4-35 

9-24 

11-07 

13-39 

1509 

20-52 

* 6 

2-20 

5-35 

10-65 

12-59 

15*03 

16-81 

22-46 

7 

2-83 ! 

6-35 

12-02 

1407 

16-62 

18-48 

24-32 

8 

3-49 ! 

7-34 

13-36 

| 15-51 

18-17 

20-09 

2613 

9 

417 

8-34 

14-68 

16-92 

19-68 

21-67 

27-88 

10 

4-87 

9-34 

15-99 

18-31 

21-16 

23-21 

29-59 

11 

5-58 

10-34 

17-28 

19-68 

22-62 

24-73 

31-26 


6-30 

11-34 

18-55 

21-03 

24-05 

26-22 

32-91 

'13 

704 

12-34 

19-81 

22-36 

25-47 

27-69 

34*53 

14 

7-79 ! 

13-34 

21-06 

23-69 

26-87 

29-14 

36-12 

15 

8*55 

14-34 

22-31 

25-00 

28-26 

30-58 

37-70 

16 

9-31 

15-34 

23-54 

26*30 

29-63 

32-00 

39-25 

17 

10-09 | 

16-34 

24-77 

27-59 

3100 

33-41 

40-79 

18 

10-87 ! 

17-34 

25-99 

28-87 ! 

32-35 1 

34-81 

42-31 

19 

11-65 

18-34 

27-20 

30-14 ! 

33-69 

36-19 

43*82 

20 

12-44 

19-34 

28-41 

31-41 

35-02 

37-57 

46-32 

21 

13-24 

20-34 

29-62 

32-67 

36-34 

38-93 

46-80 

22 

14-04 

21-34 

30-81 

33-92 

37-66 

40-29 j 

48-27 

23 

14-85 

22-34 

32-01 

35-17 

38-97 

41-64 I 

49-73 

24 

15-66 

23-34 

33-20 

36*42 1 

40*27 

42-98 j 

51*18 

25 

16-47 

24-34 

34-38 

37-65 

41-57 

44*31 j 

52-62 

26 ' 

17-29 

25-34 

35-56 

38-89 

42-86 

45-64 1 

54-05 

27 

18-11 

26-34 

36-74 

40-11 

44-14 

46-96 

55*48 

28 

18-94 

27-34 

37-92 

41-34 

45-42 

48-28 

56-89 

29 

,19-77 

28-34 

39-09 

42-56 

46-69 

49-59 

58*30 

30 

20-60 

29-34 

40-26 

43-77 

47-96 

50-89 

59*70 


Table III is abridged from Table IV of Fisher and Yates: Statistical 
Tables for Biological, Agricultural and Medical Research, published by Oliver St 
Boyd, Ltd., Edinburgh, by permission of the authors and publishers* 



348 INTRODUCTION TO STATISTICAL CALCULATIONS 

TABLE IV 


Table of t 


V. 

Probability. 

0-56* 

0*10. 

0*05 

0-02. 

0*01. 

0*001. 

1 

100 

6*31 

12*71 

31*82 

63-66 

636*62 

2 

0-82 

2-92 

4*30 

6*97 

9*93 

31*60 

3 

0-77 

2-35 

3*18 

4*54 

5*84 

12-94 

4 

0*74 

2-13 

2-78 

3*75 

4-60 

8*61 

5 

0-73 

2*02 

2-67 

3*37 

4*03 

6-86 

0 

0*72 

1*94 

2*45 

3*14 

3*71 

5*96 

7 

0*71 

1*90 

2-37 

3*00 

3*50 

5*41 

8 

0*71 

1*86 

2*31 

2*90 

3*36 

5*04 

0 

0-70 

1*83 

2*26 

2*82 | 

3*25 

4*78 

10 

0*70 i 

1*81 

2-23 

2*76 

317 

4*59 

11 

0*70 

1*80 

2*20 

2*72 

3-11 

4*44 

12 

0*70 

1*78 

2-18 

2*68 

3*06 

4-32 

13 

0*69 

1*77 

2-16 

2*65 

3*01 ! 

4*22 

14 

0-69 

1-76 

2*15 

2*62 

2-98 

4*14 

15 

0*69 

1*75 

2-13 

2*60 

2*05 

4*07 

10 

0*69 

1*75 

2*12 

2-58 

2*92 | 

4*02 

17 

0*69 

1*74 

2-11 

2-57 

2*90 

3*97 

18 

0*69 

1*73 

* 210 

2-55 

2*88 | 

3*92 

10 

0*69 | 

1*73 

2-09 

2-54 

2*86 | 

3*88 

20 

0*69 

[ 

1*73 

209 

2-53 

2*85 j 

3*85 

21 

0*69 

1*72 

2-08 

2-52 

2*83 

3*83 

22 

0*69 

1*72 

2-07 

2-51 

2*82 

3*79 

23 

0*69 

1*71 

2-07 

2-50 

2*81 

3*77 

24 

i 0*69 

1*71 

2-06 

2-49 

2*80 j 

3*75 

25 

1 0*68 

1*71 

2-06 

2-49 

2*79 ! 

3*73 

26 

0*68 

1*71 

2-06 

2-48 

2*78 

3*71 

27 

0*68 

1*70 

2-05 

2-47 

2*77 

3*69 

28 

0*88 

1*70 

2-05 

2-47 

2*76 

3*67 

20 

0*68 

1*70 

2-05 

2-46 

2*76 

3*66 

30 

0*68 

1*70 

2-04 

2-46 

2*75 

3*65 

eo , 

0*67 

1*65 

1-96 

2-33 

2-68 

3*29 


Table IV is abridged from Table III of Fisher and Yates: Statistical 
Tables for Biological , Agricultural and Medical Research , published by Oliver & 
Eoyd/ Ltd., Edinburgh* by permission of the authors and publishers. 



TABLES 


m 


TABLE V 

Values of the Correlation Coefficient for Different Levels of Significance 


V. 

0-1. 

0-05. 

0-02. 

0-01. 

' 

5 

0-669 

0-755 

0-833 

0*875 

6 

i 0622 

0-707 

0-789 

0*834 

7 

0-582 

0-666 

0-750 

0*798 

8 

0-549 

0-632 

0*716 

0*765 

9 

0*521 

0*602 

0-685 

0-735 

10 

0-497 

0-576 

0*658 

0-708 

11 

0-476 

0-553 

0*634 

0-684 

12 

0-458 

0-532 

0*612 

0-661 

13 

0-441 

0-514 

0-592 

0-641 

14 

0-426 

0-497 

0-574 

0-623 

15 

0-412 

0-482 

0-558 

0-606 

16 

0-400 

0-468 

0-543 

0-590 

17 

0-389 

0-456 

0-529 

0-575 

18 

0-378 

0-444 

0*516 

0-561 

19 

0-369 

0-433 

0-503 

0-549 

20 

0-360 

0-423 

0-492 

0*537 

25 

0*323 

0-381 

0-445 

0*487 

30 

0-296 

0-349 

0-409 

0-449 

35 

0-275 

0-325 

0-381 

0-418 ii 

40 

0-257 

0-304 

0-358 

0-393 % 

45 

0-243 

0-288 

0*338 

0-372 

50 

0-231 

0-273 

0*322 

0-354 

60 

0*211 

0-250 

1 0-295 

0-325 

70 

0-195 

0*232 

0*274 

0*302 

80 

0-183 

0-217 

0-257 

0-283 1 

90 

0-173 

0-205 

0*242 1 

0-267 

100 

0*164 

0-195 

0-230 

0*254 


Table V is abridged from Table VI of Fisher and Yates: Statistical 
Tables for Biological, Agricultural and Medical Research, published by Oliver & 
Boyd, Ltd., Edinburgh, by permission of the authors and publishers. 



550 INTRODUCTION to statistical calculations 


TABLE VI 
The Variance Ratio (i) 



Table VI is abridged from Table V of Fisher and Yates : Statistical Tables for Biological, 
cultural and Medical Research* published by Oliver & Boyd, Ltd., Edinburgh, by permission of U % 
authors and publishers. 






DELHI POLYTECHNIC 

LIBRARY 

:lass no. 5/1 

BOOK NO. W 4"7 B 
ACCESSION Na 2P 



